static double |
SparkCostUtils.getCPUTime(long nflop,
int numPartitions,
IOCostUtils.IOMetrics executorMetrics,
RDDStats output,
RDDStats... inputs) |
Computes an estimate for the time needed by the CPU to execute (including memory access)
an instruction by providing number of floating operations.
|
static double |
IOCostUtils.getMemReadTime(RDDStats stats,
IOCostUtils.IOMetrics metrics) |
Estimate time to scan distributed data sets in memory on Spark.
|
static double |
IOCostUtils.getMemWriteTime(RDDStats stats,
IOCostUtils.IOMetrics metrics) |
Estimate time to write distributed data set on memory in CP.
|
static double |
IOCostUtils.getSparkCollectTime(RDDStats output,
IOCostUtils.IOMetrics driverMetrics,
IOCostUtils.IOMetrics executorMetrics) |
Estimates the time for collecting Spark Job output;
The output RDD is transferred to the Spark driver at the end of each ResultStage;
time = transfer time (overlaps and dominates the read and deserialization times);
|
static double |
IOCostUtils.getSparkParallelizeTime(RDDStats output,
IOCostUtils.IOMetrics driverMetrics,
IOCostUtils.IOMetrics executorMetrics) |
Estimates the time to parallelize a local object to Spark.
|
static double |
IOCostUtils.getSparkShuffleReadStaticTime(RDDStats input,
IOCostUtils.IOMetrics metrics) |
Estimates the time for reading distributed RDD input at the beginning of a Stage
when a wide-transformation is partition preserving: only local disk reads
|
static double |
IOCostUtils.getSparkShuffleReadTime(RDDStats input,
IOCostUtils.IOMetrics metrics) |
Estimates the time for reading distributed RDD input at the beginning of a Stage;
time = transfer time (overlaps and dominates the read and deserialization times);
For simplification it is assumed that the whole dataset is shuffled;
|
static double |
IOCostUtils.getSparkShuffleTime(RDDStats output,
IOCostUtils.IOMetrics metrics,
boolean withDistribution) |
Combines the shuffle write and read time since these are being typically
added in one place to the general data transmission for instruction estimation.
|
static double |
IOCostUtils.getSparkShuffleWriteTime(RDDStats output,
IOCostUtils.IOMetrics metrics) |
Estimates the time for writing the RDD output to the local system at the end of a ShuffleMapStage;
time = disk write time (overlaps and dominates the serialization time)
The whole data set is being written to shuffle files even if 1 executor is utilized;
|
void |
VarStats.setRddStats(RDDStats rddStats) |
Meant to be used at testing
|