Let’s assume that we have n tasks that are distributed dynamically to p processors. Costs are random and unknown.
n <- 16
p <- 4
dist_MC <- eval_makespan_MC(n, p, runif)
range <- c(0, ceiling(n / p))
support <- seq(range[1], range[2], length.out = 3e2)
ecdf_num <- eval_makespan_numeric(n, p, dunif(support), support)
plot.ecdf(dist_MC)
points(support, ecdf_num, cex = 0.1, col = 2)
Numerical evaluation provides an approximation, but this is slow for large instance and does not provide closed formulas. Let’s analyse the completion of each task with 10 proc:
n <- 100
p <- 10
dist_MC <- eval_completion_MC(n, p, runif, MC = 1e3)
plot.ecdf(dist_MC[1,], xlim = c(min(dist_MC), max(dist_MC)))
for (i in 2:n)
plot(ecdf(dist_MC[i,]), add = TRUE, col = i)
var(dist_MC[50,])
## [1] 0.1243485
var(dist_MC[100,])
## [1] 0.170905
There are more and more variance and this property is not easy to predict.
Let’s try to approximate the variance in function of the number of processors. With two processors, let’s hypothesize that the variance is reduced by two:
p <- 2
for (n_per_proc in c(30, 100, 300)) {
n <- n_per_proc * p
dist_MC <- eval_completion_proc_MC(n, p, runif, MC = 1e3)
print(var(dist_MC[1,]) / (1 / 12) / n_per_proc)
}
## [1] 0.5634475
## [1] 0.4824206
## [1] 0.5109257
It seems to work. Let’s proceed with 3 proc:
p <- 3
for (n_per_proc in c(30, 100, 300)) {
n <- n_per_proc * p
dist_MC <- eval_completion_proc_MC(n, p, runif, MC = 1e3)
print(var(dist_MC[1,]) / (1 / 12) / n_per_proc)
}
## [1] 0.3355795
## [1] 0.351331
## [1] 0.3278437
It seems the variance is divided by the number of proc. Let’s test this hypothesis on larger settings:
for (n in c(30, 100, 300))
for (p in c(1, 2, 3, 5, 10)) {
dist_MC <- eval_completion_proc_MC(n, p, runif, MC = 1e4)
print(c(n, p, var(as.vector(dist_MC)), (n / p) / 12 / p))
}
## [1] 30.000000 1.000000 2.450723 2.500000
## [1] 30.0000000 2.0000000 0.6709792 0.6250000
## [1] 30.0000000 3.0000000 0.3283511 0.2777778
## [1] 30.000000 5.000000 0.152402 0.100000
## [1] 30.00000000 10.00000000 0.07915153 0.02500000
## [1] 100.000000 1.000000 8.409261 8.333333
## [1] 100.000000 2.000000 2.135468 2.083333
## [1] 100.0000000 3.0000000 0.9692738 0.9259259
## [1] 100.0000000 5.0000000 0.3925361 0.3333333
## [1] 100.00000000 10.00000000 0.13784735 0.08333333
## [1] 300.00000 1.00000 24.90938 25.00000
## [1] 300.000000 2.000000 6.283977 6.250000
## [1] 300.000000 3.000000 2.845340 2.777778
## [1] 300.000000 5.000000 1.061964 1.000000
## [1] 300.0000000 10.0000000 0.3134454 0.2500000
The approximation is not good when the number of tasks per processors is small. Let’s check this:
n_per_proc <- 30
for (p in c(1, 2, 3, 5, 10, 20)) {
n <- p * n_per_proc
dist_MC <- eval_completion_proc_MC(n, p, runif, MC = 1e4)
var_measured <- var(as.vector(dist_MC))
var_predicted <- (n / p) / 12 / p
print(c(n, p, var_measured, var_predicted, var_measured / var_predicted - 1))
}
## [1] 3.000000e+01 1.000000e+00 2.501946e+00 2.500000e+00 7.784865e-04
## [1] 60.00000000 2.00000000 1.29058683 1.25000000 0.03246946
## [1] 90.00000000 3.00000000 0.88303102 0.83333333 0.05963722
## [1] 150.0000000 5.0000000 0.5602653 0.5000000 0.1205306
## [1] 300.0000000 10.0000000 0.2986635 0.2500000 0.1946538
## [1] 600.0000000 20.0000000 0.1781394 0.1250000 0.4251151
The precision decreases with the number of proc too.
Conclusion: we have a numerical method that works well with less than 20 tasks. We also have some approximation scheme when the number of tasks per processor is greater than 30.
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
##
## locale:
## [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
## [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
## [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] stringr_1.0.0 tidyr_0.2.0 ggplot2_0.9.3.1 dplyr_0.4.3
## [5] purrr_0.2.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.10 knitr_1.15.1 magrittr_1.5
## [4] MASS_7.3-47 munsell_0.4 colorspace_1.2-2
## [7] R6_2.1.0 plyr_1.8.1 tools_3.4.0
## [10] dichromat_2.0-0 parallel_3.4.0 grid_3.4.0
## [13] gtable_0.1.2 pacman_0.4.6 DBI_0.3.1
## [16] htmltools_0.3.6 yaml_2.1.13 rprojroot_1.2
## [19] digest_0.6.3 assertthat_0.1 RColorBrewer_1.0-5
## [22] reshape2_1.2.2 evaluate_0.10 rmarkdown_1.5
## [25] labeling_0.1 stringi_0.5-5 compiler_3.4.0
## [28] scales_0.2.3 backports_1.0.5 proto_0.3-10