Let’s assume that we have n tasks that are distributed dynamically to p processors. Costs are random and unknown.

n <- 16
p <- 4
dist_MC <- eval_makespan_MC(n, p, runif)

range <- c(0, ceiling(n / p))
support <- seq(range[1], range[2], length.out = 3e2)
ecdf_num <- eval_makespan_numeric(n, p, dunif(support), support)

plot.ecdf(dist_MC)
points(support, ecdf_num, cex = 0.1, col = 2)

Numerical evaluation provides an approximation, but this is slow for large instance and does not provide closed formulas. Let’s analyse the completion of each task with 10 proc:

n <- 100
p <- 10
dist_MC <- eval_completion_MC(n, p, runif, MC = 1e3)
plot.ecdf(dist_MC[1,], xlim = c(min(dist_MC), max(dist_MC)))
for (i in 2:n)
  plot(ecdf(dist_MC[i,]), add = TRUE, col = i)

var(dist_MC[50,])
## [1] 0.1243485
var(dist_MC[100,])
## [1] 0.170905

There are more and more variance and this property is not easy to predict.

Let’s try to approximate the variance in function of the number of processors. With two processors, let’s hypothesize that the variance is reduced by two:

p <- 2
for (n_per_proc in c(30, 100, 300)) {
  n <- n_per_proc * p
  dist_MC <- eval_completion_proc_MC(n, p, runif, MC = 1e3)
  print(var(dist_MC[1,]) / (1 / 12) / n_per_proc)
}
## [1] 0.5634475
## [1] 0.4824206
## [1] 0.5109257

It seems to work. Let’s proceed with 3 proc:

p <- 3
for (n_per_proc in c(30, 100, 300)) {
  n <- n_per_proc * p
  dist_MC <- eval_completion_proc_MC(n, p, runif, MC = 1e3)
  print(var(dist_MC[1,]) / (1 / 12) / n_per_proc)
}
## [1] 0.3355795
## [1] 0.351331
## [1] 0.3278437

It seems the variance is divided by the number of proc. Let’s test this hypothesis on larger settings:

for (n in c(30, 100, 300))
  for (p in c(1, 2, 3, 5, 10)) {
    dist_MC <- eval_completion_proc_MC(n, p, runif, MC = 1e4)
    print(c(n, p, var(as.vector(dist_MC)), (n / p) / 12 / p))
  }
## [1] 30.000000  1.000000  2.450723  2.500000
## [1] 30.0000000  2.0000000  0.6709792  0.6250000
## [1] 30.0000000  3.0000000  0.3283511  0.2777778
## [1] 30.000000  5.000000  0.152402  0.100000
## [1] 30.00000000 10.00000000  0.07915153  0.02500000
## [1] 100.000000   1.000000   8.409261   8.333333
## [1] 100.000000   2.000000   2.135468   2.083333
## [1] 100.0000000   3.0000000   0.9692738   0.9259259
## [1] 100.0000000   5.0000000   0.3925361   0.3333333
## [1] 100.00000000  10.00000000   0.13784735   0.08333333
## [1] 300.00000   1.00000  24.90938  25.00000
## [1] 300.000000   2.000000   6.283977   6.250000
## [1] 300.000000   3.000000   2.845340   2.777778
## [1] 300.000000   5.000000   1.061964   1.000000
## [1] 300.0000000  10.0000000   0.3134454   0.2500000

The approximation is not good when the number of tasks per processors is small. Let’s check this:

n_per_proc <- 30
for (p in c(1, 2, 3, 5, 10, 20)) {
  n <- p * n_per_proc
  dist_MC <- eval_completion_proc_MC(n, p, runif, MC = 1e4)
  var_measured <- var(as.vector(dist_MC))
  var_predicted <- (n / p) / 12 / p
  print(c(n, p, var_measured, var_predicted, var_measured / var_predicted - 1))
}
## [1] 3.000000e+01 1.000000e+00 2.501946e+00 2.500000e+00 7.784865e-04
## [1] 60.00000000  2.00000000  1.29058683  1.25000000  0.03246946
## [1] 90.00000000  3.00000000  0.88303102  0.83333333  0.05963722
## [1] 150.0000000   5.0000000   0.5602653   0.5000000   0.1205306
## [1] 300.0000000  10.0000000   0.2986635   0.2500000   0.1946538
## [1] 600.0000000  20.0000000   0.1781394   0.1250000   0.4251151

The precision decreases with the number of proc too.

Conclusion: we have a numerical method that works well with less than 20 tasks. We also have some approximation scheme when the number of tasks per processor is greater than 30.

## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
## 
## locale:
##  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
##  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
##  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] stringr_1.0.0   tidyr_0.2.0     ggplot2_0.9.3.1 dplyr_0.4.3    
## [5] purrr_0.2.0    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.10       knitr_1.15.1       magrittr_1.5      
##  [4] MASS_7.3-47        munsell_0.4        colorspace_1.2-2  
##  [7] R6_2.1.0           plyr_1.8.1         tools_3.4.0       
## [10] dichromat_2.0-0    parallel_3.4.0     grid_3.4.0        
## [13] gtable_0.1.2       pacman_0.4.6       DBI_0.3.1         
## [16] htmltools_0.3.6    yaml_2.1.13        rprojroot_1.2     
## [19] digest_0.6.3       assertthat_0.1     RColorBrewer_1.0-5
## [22] reshape2_1.2.2     evaluate_0.10      rmarkdown_1.5     
## [25] labeling_0.1       stringi_0.5-5      compiler_3.4.0    
## [28] scales_0.2.3       backports_1.0.5    proto_0.3-10