Extending the Experimental Results

For a journal version, it will be necessary to extend the simulation. In particular, it would be good to compare to several other competing approaches and to consider different platforms (varying ratio m/k, varying ratio between the expected values, varying number of tasks).

Number of tasks

Let’s start with a symmetric situation with medium heterogeneity (CV = 0.3), 10 processors of each type and let’s vary the number of tasks.

m <- 10
k <- 10
ntasks <- c(10, 20, 30, 50, 100, 200, 300, 500)
ratios <- map_df(ntasks, function(n) {
  replicate(10, {
    costs_CPU <- rgamma_cost(n, mean = 1, sd = 0.3)
    costs_GPU <- rgamma_cost(n, mean = 1, sd = 0.3)
    costs <- matrix(c(costs_CPU, costs_GPU), ncol = 2, byrow = FALSE)
    polish <- schedule_balanced_estimate_polished(costs, m, k)
    balpolish <- schedule_balancing_balanced_estimate(costs, m, k, polish$alloc,
                                                      polish$inv)
    RR <- list(schedule_balanced_makespan_efficient(costs, m, k),
               balpolish,
               polish,
               schedule_DADA(costs, m, k),
               schedule_DualHP(costs, m, k),
               schedule_CLB2C(costs, m, k),
               schedule_HeteroPrio(costs, m, k))
    names(RR) <- c("best", "bal_polish", "polish", "DADA", "DualHP", "CLB2C",
                   "HeteroPrio")
    c(list(n = n, LB = max(LB_europar(costs, m, k),
                           LB_balance(costs, m, k))), map_dbl(RR, ~.$mks))
  }) %>% t() %>% data.frame() %>% map(unlist) %>% data.frame()
})
ratios %>%
  rowwise() %>%
  mutate(min = min(best, bal_polish, polish, DADA, DualHP, CLB2C, HeteroPrio)) %>%
  gather(key = "algo", value = "ratio", -n, -LB, -min) %>%
  ggplot(aes(x = n, y = ratio / LB, color = algo)) +
  geom_boxplot(aes(group = interaction(algo, n)), outlier.size = 1) +
  scale_color_brewer("Algorithm", palette = "Set1") +
  scale_x_log10("Number of tasks") +
  scale_y_continuous("Makespan / LB")

Small instances are easy because there are many resources and each task is allocated to a free resource. This indicates that the hardest instances is when there is around 50 tasks (2 to 3 tasks per machine). However, looking at the larger instances (more than 200, or 10 tasks per machine) also makes sense because all strategies are not asymptotically equivalent.

We will therefore concentrate on small instances (a few tasks per machine) where sub-optimal scheduling decisions may have large consequence, and large instances for which we look to asymptotic performance.

Costs variation

Let’s study how the variance of the costs impacts the result with a small symmetric instance.

m <- 10
k <- 10
tasks <- c(50, 200)
cvs <- c(0.03, 0.05, 0.1, 0.2, 0.3, 0.5, 1, 2, 3)
ratios <- map_df(tasks, function(n) {
  map_df(cvs, function(cv) {
    replicate(10, {
      costs_CPU <- rgamma_cost(n, mean = 1, sd = cv)
      costs_GPU <- rgamma_cost(n, mean = 1, sd = cv)
      costs <- matrix(c(costs_CPU, costs_GPU), ncol = 2, byrow = FALSE)
      polish <- schedule_balanced_estimate_polished(costs, m, k)
      balpolish <- schedule_balancing_balanced_estimate(costs, m, k, polish$alloc,
                                                        polish$inv)
      RR <- list(schedule_balanced_makespan_efficient(costs, m, k),
                 balpolish,
                 polish,
                 schedule_DADA(costs, m, k),
                 schedule_DualHP(costs, m, k),
                 schedule_CLB2C(costs, m, k),
                 schedule_HeteroPrio(costs, m, k))
      names(RR) <- c("best", "bal_polish", "polish", "DADA", "DualHP", "CLB2C",
                     "HeteroPrio")
      c(list(n = n, cv = cv, LB = max(LB_europar(costs, m, k),
                                      LB_balance(costs, m, k))), map_dbl(RR, ~.$mks))
    }) %>% t() %>% data.frame() %>% map(unlist) %>% data.frame()
  })
})
ratios %>%
  rowwise() %>%
  mutate(min = min(best, bal_polish, polish, DADA, DualHP, CLB2C, HeteroPrio)) %>%
  gather(key = "algo", value = "ratio", -cv, -n, -LB, -min) %>%
  ggplot(aes(x = cv, y = ratio / LB, color = algo)) +
  geom_boxplot(aes(group = interaction(algo, cv)), outlier.size = 1) +
  facet_wrap(~ n, nrow = 2) +
  scale_color_brewer("Algorithm", palette = "Set1") +
  scale_x_log10("Coefficient of variation") +
  scale_y_continuous("Makespan / LB")

The difficult setting is between 0.1 and 1. Below, the instance is easy (all tasks are similar and can be balanced simply). Above, some tasks have very large costs and dominate the completion times. Surprisingly, with many tasks, large variance remains difficult for many approaches (it is more difficult to pack tasks below a completion time achieved by a large tasks).

We concentrate on coefficients of variation between 0.1 and 0.5 for small instances and between 0.2 and 1 for large instances: between 0.1 and 1 seems safe for all instances.

Ratio CPU / GPU

Let’s play on the ratio m/k while keeping the same amount of work on both types of processors (it means that tasks will be divided in two sets of similar size).

tasks <- c(50, 200)
mk <- 20
CPU <- c(10, 12, 14, 16, 18)
ratios <- map_df(tasks, function(n) {
  map_df(CPU, function(m) {
    replicate(10, {
      k <- mk - m
      costs_CPU <- m / k * rgamma_cost(n, mean = 1, sd = 0.3)
      costs_GPU <- rgamma_cost(n, mean = 1, sd = 0.3)
      costs <- matrix(c(costs_CPU, costs_GPU), ncol = 2, byrow = FALSE)
      polish <- schedule_balanced_estimate_polished(costs, m, k)
      balpolish <- schedule_balancing_balanced_estimate(costs, m, k, polish$alloc,
                                                        polish$inv)
      RR <- list(schedule_balanced_makespan_efficient(costs, m, k),
                 balpolish,
                 polish,
                 schedule_DADA(costs, m, k),
                 schedule_DualHP(costs, m, k),
                 schedule_CLB2C(costs, m, k),
                 schedule_HeteroPrio(costs, m, k))
      names(RR) <- c("best", "bal_polish", "polish", "DADA", "DualHP", "CLB2C",
                     "HeteroPrio")
      c(list(n = n, m = m, LB = max(LB_europar(costs, m, k),
                                    LB_balance(costs, m, k))), map_dbl(RR, ~.$mks))
    }) %>% t() %>% data.frame() %>% map(unlist) %>% data.frame()
  })
})
ratios %>%
  rowwise() %>%
  mutate(min = min(best, bal_polish, polish, DADA, DualHP, CLB2C, HeteroPrio)) %>%
  gather(key = "algo", value = "ratio", -n, -m, -LB, -min) %>%
  ggplot(aes(x = m, y = ratio / LB, color = algo)) +
  geom_boxplot(aes(group = interaction(algo, m)), outlier.size = 1) +
  facet_wrap(~ n, ncol = 1) +
  scale_color_brewer("Algorithm", palette = "Set1") +
  scale_x_continuous("Number of CPU") +
  scale_y_continuous("Makespan / LB")

For small instances, DADA performance decreases as the asymmetry increases, while HeteroPrio performs better than CLB2C. The other approaches have similar behavior, but it seems that the harder instances is when they are asymmetric.

For large instances, all approaches are below 10% of the optimal and the behavior of each algorithm seems stable.

Work imbalance

Let’s consider that we put less tasks on one of the processor type. We consider an asymmetric platform with twice as many CPU as GPU.

tasks <- c(50, 200)
m <- 13
k <- 7
mean_CPU <- c(0.1, 0.2, 0.3, 0.5, 1, 2, 3, 5, 10)
ratios <- map_df(tasks, function(n) {
  map_df(mean_CPU, function(m_CPU) {
    replicate(10, {
      costs_CPU <- m_CPU * rgamma_cost(n, mean = 1, sd = 0.3)
      costs_GPU <- rgamma_cost(n, mean = 1, sd = 0.3)
      costs <- matrix(c(costs_CPU, costs_GPU), ncol = 2, byrow = FALSE)
      polish <- schedule_balanced_estimate_polished(costs, m, k)
      balpolish <- schedule_balancing_balanced_estimate(costs, m, k, polish$alloc,
                                                        polish$inv)
      RR <- list(schedule_balanced_makespan_efficient(costs, m, k),
                 balpolish,
                 polish,
                 schedule_DADA(costs, m, k),
                 schedule_DualHP(costs, m, k),
                 schedule_CLB2C(costs, m, k),
                 schedule_HeteroPrio(costs, m, k))
      names(RR) <- c("best", "bal_polish", "polish", "DADA", "DualHP", "CLB2C",
                     "HeteroPrio")
      c(list(n = n, m_CPU = m_CPU, LB = max(LB_europar(costs, m, k),
                                            LB_balance(costs, m, k))), map_dbl(RR, ~.$mks))
    }) %>% t() %>% data.frame() %>% map(unlist) %>% data.frame()
  })
})
ratios %>%
  rowwise() %>%
  mutate(min = min(best, bal_polish, polish, DADA, DualHP, CLB2C, HeteroPrio)) %>%
  gather(key = "algo", value = "ratio", -n, -m_CPU, -LB, -min) %>%
  ggplot(aes(x = m_CPU, y = ratio / LB, color = algo)) +
  geom_boxplot(aes(group = interaction(algo, m_CPU)), outlier.size = 1) +
  facet_wrap(~ n, ncol = 1) +
  scale_color_brewer("Algorithm", palette = "Set1") +
  scale_x_log10("CPU expected cost") +
  scale_y_continuous("Makespan / LB")

For small instances, this shows that instances where tasks are much costlier on one processor type are easy to solve: any task on the slow processors will reduce the makespan. On the other hand, when the initial allocation has works on both processors that are almost balanced, there is only a few tasks to move and many have similar performance on all processors. Their allocation has thus low impact and the instance can be assumed to be easy. Difficult instances arise when the ratio between expected costs on both types of machines are sufficiently different to induce a significant imbalance, but not as strong as to put all tasks on the same processor.

For large instances, very strong imbalance is not limited by a large cost on one type of processors. However, the expected costs do not seem to have an impact on the difficulty on the instances. There is an exception for the polish algorithm, which is impacted by large tasks that may benefit from going on the other processor type.

Conclusion

For small instances (2 to 3 tasks per resource):

coefficients of variation betwenn 0.1 and 0.5
different numbers of processors of each type
higher costs on one type of resources
most maximum costs must be below the optimal makespan

For large instances (10 tasks per resource):

coefficients of variation betwenn 0.2 and 1
higher costs on one type of resources

For consistency, let’s apply the same criteria on both small and large instances.

CV heterogeneity

It remains to study the effect of the CV heterogeneity.

n <- 50
m <- 13
k <- 7
mean_CPU <- 5
cvs_CPU <- c(0.1, 0.2, 0.3, 0.5, 1)
cvs_GPU <- c(0.1, 0.2, 0.3, 0.5, 1)
ratios <- map_df(cvs_CPU, function(cv_CPU) {
  map_df(cvs_GPU, function(cv_GPU) {
    replicate(10, {
      costs_CPU <- mean_CPU * rgamma_cost(n, mean = 1, sd = cv_CPU)
      costs_GPU <- rgamma_cost(n, mean = 1, sd = cv_GPU)
      costs <- matrix(c(costs_CPU, costs_GPU), ncol = 2, byrow = FALSE)
      polish <- schedule_balanced_estimate_polished(costs, m, k)
      balpolish <- schedule_balancing_balanced_estimate(costs, m, k, polish$alloc,
                                                        polish$inv)
      RR <- list(schedule_balanced_makespan_efficient(costs, m, k),
                 balpolish,
                 polish,
                 schedule_DADA(costs, m, k),
                 schedule_DualHP(costs, m, k),
                 schedule_CLB2C(costs, m, k),
                 schedule_HeteroPrio(costs, m, k))
      names(RR) <- c("best", "bal_polish", "polish", "DADA", "DualHP", "CLB2C",
                     "HeteroPrio")
      c(list(cv_CPU = cv_CPU, cv_GPU = cv_GPU,
             LB = max(LB_europar(costs, m, k),
                      LB_balance(costs, m, k))), map_dbl(RR, ~.$mks))
    }) %>% t() %>% data.frame() %>% map(unlist) %>% data.frame()
  })
})
ratios %>%
  rowwise() %>%
  mutate(min = min(best, bal_polish, polish, DADA, DualHP, CLB2C, HeteroPrio)) %>%
  gather(key = "algo", value = "ratio", -cv_CPU, -cv_GPU, -LB, -min) %>%
  ggplot(aes(x = cv_CPU, y = ratio / LB, color = algo)) +
  geom_boxplot(aes(group = interaction(algo, cv_CPU)), outlier.size = 1) +
  facet_wrap(~ cv_GPU, ncol = 1) +
  scale_color_brewer("Algorithm", palette = "Set1") +
  scale_x_log10("CPU cost CV") +
  scale_y_continuous("Makespan / LB")

Several observations:

the best two algorithms, best and bal_polish, are consistently the best
bal_polish performs the worst when the CPU CV is below 0.5 and the GPU CV is 0.5
polish and DualHP perform the worst when the CPU CV is low, quite independently of the GPU CV
DADA is consistently above 30% of the LB but seems to be the worst for low CPU CV and high GPU CV
CLB2C and HeteroPrio perform the worst when both the CPU and GPU CV are high

Low GPU CV does not seem to be interesting (easier for most approaches). Chosing 0.5 for the GPU CV makes apparent the impact of the CPU CV on bal_polish, polish, DualHP, CLB2C and HeteroPrio.

n <- 200
m <- 13
k <- 7
mean_CPU <- 5
cvs_CPU <- c(0.1, 0.2, 0.3, 0.5, 1)
cvs_GPU <- c(0.1, 0.2, 0.3, 0.5, 1)
ratios <- map_df(cvs_CPU, function(cv_CPU) {
  map_df(cvs_GPU, function(cv_GPU) {
    replicate(10, {
      costs_CPU <- mean_CPU * rgamma_cost(n, mean = 1, sd = cv_CPU)
      costs_GPU <- rgamma_cost(n, mean = 1, sd = cv_GPU)
      costs <- matrix(c(costs_CPU, costs_GPU), ncol = 2, byrow = FALSE)
      polish <- schedule_balanced_estimate_polished(costs, m, k)
      balpolish <- schedule_balancing_balanced_estimate(costs, m, k, polish$alloc,
                                                        polish$inv)
      RR <- list(schedule_balanced_makespan_efficient(costs, m, k),
                 balpolish,
                 polish,
                 schedule_DADA(costs, m, k),
                 schedule_DualHP(costs, m, k),
                 schedule_CLB2C(costs, m, k),
                 schedule_HeteroPrio(costs, m, k))
      names(RR) <- c("best", "bal_polish", "polish", "DADA", "DualHP", "CLB2C",
                     "HeteroPrio")
      c(list(cv_CPU = cv_CPU, cv_GPU = cv_GPU,
             LB = max(LB_europar(costs, m, k),
                      LB_balance(costs, m, k))), map_dbl(RR, ~.$mks))
    }) %>% t() %>% data.frame() %>% map(unlist) %>% data.frame()
  })
})
ratios %>%
  rowwise() %>%
  mutate(min = min(best, bal_polish, polish, DADA, DualHP, CLB2C, HeteroPrio)) %>%
  gather(key = "algo", value = "ratio", -cv_CPU, -cv_GPU, -LB, -min) %>%
  ggplot(aes(x = cv_CPU, y = ratio / LB, color = algo)) +
  geom_boxplot(aes(group = interaction(algo, cv_CPU)), outlier.size = 1) +
  facet_wrap(~ cv_GPU, ncol = 1) +
  scale_color_brewer("Algorithm", palette = "Set1") +
  scale_x_log10("CPU cost CV") +
  scale_y_continuous("Makespan / LB")

Several observations:

the best two algorithms, best and bal_polish, are consistently the best
polish and DualHP perform the worst when the CPU and GPU CV are low
DADA performs the worst for high GPU CV
CLB2C and HeteroPrio perform the worst when both the CPU and GPU CV are high

It is more complicated to select a single parameter to study for large instances. A solution could be to study the effect of an identical CV for CPU and GPU, and then an opposite CV (4 graphs).

Conclusion

As a summary experiment, it could be interesting to measure the number of times any approach is the best on the instances and what was the worst ratio to the best in the best 80% cases.

## R version 3.4.4 (2018-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
## 
## locale:
##  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
##  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
##  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_2.2.1 dplyr_0.7.4   tidyr_0.8.0   purrr_0.2.4   stringr_1.3.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.16     knitr_1.20       bindr_0.1.1      magrittr_1.5    
##  [5] munsell_0.4.3    colorspace_1.3-2 R6_2.2.2         rlang_0.2.0     
##  [9] plyr_1.8.4       tools_3.4.4      grid_3.4.4       gtable_0.2.0    
## [13] pacman_0.4.6     htmltools_0.3.6  lazyeval_0.2.1   yaml_2.1.18     
## [17] rprojroot_1.3-2  digest_0.6.15    assertthat_0.2.0 tibble_1.4.2    
## [21] bindrcpp_0.2     glue_1.2.0       evaluate_0.10.1  rmarkdown_1.9   
## [25] stringi_1.1.7    compiler_3.4.4   pillar_1.2.1     scales_0.5.0    
## [29] backports_1.1.2  pkgconfig_2.0.1

Extending the Experimental Results

L.-C. Canon

September 11 to 14, 2017

Number of tasks

Costs variation

Ratio CPU / GPU

Work imbalance

Conclusion

CV heterogeneity

Conclusion