Let’s compare the greedy heuristic with existing heuristic. We will first confront the heuristic to the best result obtained with other heuristics and then confront this best result with the lower bound (if this best result is good).

Preliminary Comparison

Let’s consider a difficult setting to validate the overall approach.

n <- 30
m <- 5
k <- 5
costs <- matrix(runif(2 * n), nrow = n, byrow = FALSE)
schedule_greedy_balance(costs, m, k)
## [1] 1.129088
all_RCmax_method(convert2RCmax(costs, m, k))$cmax_ga_algos
## [1] 1.187867

OK, this seems to work. The greedy heuristic performs better than even the genetic algorithm, which seems suspicious: either it is good, or there is a bug.

Let’s try settings for which the genetic algorithm performs better.

n <- 30
m <- 10
k <- 10
costs <- matrix(runif(2 * n), nrow = n, byrow = FALSE)
schedule_greedy_balance(costs, m, k)
## [1] 0.9203205
all_RCmax_method(convert2RCmax(costs, m, k))$cmax_ga_algos
## [1] 0.9203205

OK, this seems right: with many machines, it is easier to perform well.

Effect of the Number of Machines

Given the time to compute a single schedule with the genetic algorithm, we cannot vary both the number of machines and the number of tasks.

n <- 20
results <- NULL
for (i in 1:30) {
  m <- sample(1:5, 1)
  costs <- matrix(runif(2 * n), nrow = n, byrow = FALSE)
  mks <- schedule_greedy_balance(costs, m, m)
  genetic <- all_RCmax_method(convert2RCmax(costs, m, m))$cmax_ga_algos
  results <- rbind(results, data.frame(m = m, ratio = mks / genetic))
}
ggplot(data = results, aes(x = m, y = ratio)) +
  geom_boxplot(aes(group = factor(m))) +
  geom_jitter()

The greedy heuristic has reasonable performance. It would still be interesting to gather more data. Let’s deactivate the genetic algorithms but keep all the others.

n <- 20
results <- NULL
for (i in 1:1000) {
  m <- sample(1:5, 1)
  costs <- matrix(runif(2 * n), nrow = n, byrow = FALSE)
  mks <- schedule_greedy_balance(costs, m, m)
  others <- min(all_RCmax_method(convert2RCmax(costs, m, m), genetic = FALSE))
  results <- rbind(results, data.frame(m = m, ratio = mks / others))
}
ggplot(data = results, aes(x = m, y = ratio)) +
  geom_boxplot(aes(group = factor(m))) +
  geom_jitter(size = 1)

We have a similar plot, which suggests that the genetic algorithm is unable to notably improve the solutions found by the others heuristics. In the following, we discard the genetic algorithm.

Effect of the Number of Tasks and Machines

Let’s consider the median for each combination of \(n\) and \(m\).

results <- NULL
for (n in c(2, 3, 5, 10, 20, 30, 50)) {
  for (m in c(1:5, 7, 9)) {
    ratios <- replicate(100, {
      costs <- matrix(runif(2 * n), nrow = n, byrow = FALSE)
      mks <- schedule_greedy_balance(costs, m, m)
      others <- min(all_RCmax_method(convert2RCmax(costs, m, m), genetic = FALSE))
      mks / others
    })
    results <- rbind(results, data.frame(n = n, m = m, ratio = median(ratios)))
  }
}
ggplot(data = results, aes(x = n, y = m)) +
  geom_tile(aes(fill = ratio)) +
  geom_contour(aes(z = ratio, color = ..level..), breaks = seq(1, 1.5, 0.01)) +
  scale_x_log10(breaks = c(2:5, seq(7, 25, 3), seq(30, 50, 10))) +
  scale_y_log10() +
  annotation_logticks(sides = "l") +
  scale_fill_gradientn(colours = c("white", rainbow(100), "black")) +
  scale_color_continuous(low = "black", high = "black", guide = "none")

With a large number of tasks and few machines, the greedy algorithm is often dominated by other heuristics. However, when the number of machines increases in compared to the number of tasks, the greedy algorithm is often more efficient that the best of all the \(R||C_{\max}\) heuristics. This may be caused by the fact that the greedy heuristic is less impacted by the variation in the number of machines because it is a single parameters whereas the other heuristics see their instances grow larger as the number of machines increases.

Another interesting observation is that with few tasks and one machine of each type, the greedy heuristic performs as well as the best of the other heuristics. The previous study showed that the lower bound was particularly distant from the greedy heuristic in this area, which suggests that this was caused by the lower bound and not the heuristic. The reason why the lower bound is that bad may be related to the fact that the last task is cut, whereas in practice, it is necessarily on one processor or the other.

Large Instances

Let’s focus on large instance \(n=100\).

n <- 100
results <- NULL
for (i in 1:100) {
  m <- sample(1:25, 1)
  costs <- matrix(runif(2 * n), nrow = n, byrow = FALSE)
  LB <- allocation_greedy_balance(costs, m, m)$LB
  mks <- schedule_greedy_balance(costs, m, m)
  others <- min(all_RCmax_method(convert2RCmax(costs, m, m), genetic = FALSE))
  results <- rbind(results, data.frame(m = m, greedy_others = mks / others,
                                       others_LB = others / LB))
}
results %>%
  mutate(best_LB = pmin(others_LB, greedy_others * others_LB)) %>%
  select(m, greedy_others, best_LB) %>%
  gather(type, ratio, -m) %>%
  ggplot(aes(x = m, y = ratio, color = type, shape = type)) +
  geom_jitter() +
  geom_smooth()

When the number of machines is low, the best of the other heuristics is slightly better than the greedy heuristic (around 1%) and the lower bound is not far (around 1%). When the number of machines increases, the greedy heuristic is better than the best of the other heuristics (around 2-3%) and the lower bound can be quite far (around 5%). When the number of machines is sufficiently large, there is no more challenge but just before that the greedy heuristic may be significantly outperformed.

Conclusion

We observe that the greedy heuristic shows reasonable performance. It even often outperforms other heuristics for \(R||C_{\max}\), especially when the number of machines is large (> 5). Moreover, the lower bound may not be sufficiently tight to be used as a baseline for comparison.

As future work, it would be interesting to compare the lower bound to the one from “Scalable linear programming based resource allocation for makespan minimization in heterogeneous computing systems”. Also, several other methods exist for \(R||C_{\max}\) such as a 2-approximation from “Approximation algorithms for scheduling unrelated parallel machines”.

## R version 3.3.2 (2016-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
## 
## locale:
##  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
##  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
##  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_2.1.0 dplyr_0.4.3   tidyr_0.2.0   purrr_0.2.0   stringr_1.0.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.7      digest_0.6.9     assertthat_0.1   plyr_1.8.1      
##  [5] grid_3.3.2       R6_2.1.2         gtable_0.1.2     DBI_0.3.1       
##  [9] pacman_0.4.1     formatR_1.4      magrittr_1.5     scales_0.4.0    
## [13] evaluate_0.10    stringi_0.5-5    rmarkdown_1.1    tools_3.3.2     
## [17] munsell_0.4      parallel_3.3.2   yaml_2.1.13      colorspace_1.2-2
## [21] htmltools_0.3.5  knitr_1.14       tibble_1.2