We are interested in studying empirically relevant measures of task and machine heterogeneity to assess the heterogeneity quadrant concept and whether it is achieved.
First, we want to generate uniform cost matrices for which we have a clear and intuitive control over those heterogeneities.
Then, we want a set of measures to see which ones are the most relevant.
Then, we will test whether previous works that uses the original range-based method from Ali provides distinct values for those measures and whether the heterogeneity quadrant is achieved.
generate_uniform_matrix <- function(n, m, mean, cv_task, cv_mach) {
rgamma_cost <- function(n, mean, sigma) {
if (mean == 0 || sigma == 0)
return(rep(mean, n))
shape <- mean^2/sigma^2
rate <- mean/sigma^2
return(rgamma(n, shape, rate))
}
task <- rgamma_cost(n, mean, cv_task * mean)
machine <- rgamma_cost(m, mean, cv_mach * mean)
return(create_matrix(task, machine))
}
(lolo <- generate_uniform_matrix(9, 7, 1, 0.1, 0.1))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 1.0315 1.0507 1.0860 1.1143 1.0215 1.2134 1.1870
## [2,] 0.8827 0.8992 0.9294 0.9536 0.8742 1.0384 1.0158
## [3,] 0.8375 0.8531 0.8818 0.9047 0.8294 0.9852 0.9637
## [4,] 0.9506 0.9684 1.0009 1.0270 0.9414 1.1183 1.0939
## [5,] 0.7769 0.7914 0.8180 0.8393 0.7694 0.9139 0.8940
## [6,] 0.9574 0.9753 1.0081 1.0343 0.9482 1.1263 1.1018
## [7,] 0.9811 0.9994 1.0330 1.0599 0.9716 1.1541 1.1290
## [8,] 0.9657 0.9837 1.0168 1.0433 0.9564 1.1361 1.1113
## [9,] 0.8846 0.9011 0.9314 0.9556 0.8760 1.0406 1.0180
(lohi <- generate_uniform_matrix(9, 7, 1, 0.1, 1))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.4681 0.1592 0.9164 0.2639 1.721 1.2905 0.4251
## [2,] 0.4620 0.1571 0.9045 0.2605 1.698 1.2736 0.4195
## [3,] 0.4310 0.1466 0.8438 0.2430 1.584 1.1882 0.3914
## [4,] 0.3467 0.1179 0.6788 0.1955 1.274 0.9559 0.3149
## [5,] 0.4497 0.1529 0.8803 0.2535 1.653 1.2396 0.4083
## [6,] 0.4756 0.1617 0.9312 0.2682 1.748 1.3112 0.4319
## [7,] 0.3987 0.1356 0.7805 0.2248 1.465 1.0991 0.3620
## [8,] 0.4075 0.1386 0.7978 0.2298 1.498 1.1235 0.3701
## [9,] 0.4178 0.1421 0.8178 0.2355 1.536 1.1516 0.3794
(hihi <- generate_uniform_matrix(9, 7, 1, 1, 1))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.06333 0.09308 0.10922 0.09739 0.02601 0.5745 0.7433
## [2,] 0.26613 0.39116 0.45897 0.40927 0.10929 2.4142 3.1233
## [3,] 0.03606 0.05300 0.06219 0.05545 0.01481 0.3271 0.4232
## [4,] 1.03162 1.51630 1.77919 1.58651 0.42364 9.3586 12.1075
## [5,] 0.22503 0.33075 0.38810 0.34607 0.09241 2.0414 2.6410
## [6,] 0.04409 0.06480 0.07604 0.06780 0.01811 0.4000 0.5174
## [7,] 0.02624 0.03857 0.04526 0.04035 0.01078 0.2380 0.3080
## [8,] 0.55541 0.81635 0.95789 0.85415 0.22808 5.0385 6.5185
## [9,] 0.78949 1.16041 1.36160 1.21414 0.32421 7.1621 9.2658
OK, now we have a way to generate matrices that correspond exactly to the intuition behind the heterogeneity quadrant.
For each row, we compute the mean, the standard deviation or the CV. We then compute the CV of those means or the mean of those CVs or the standard deviation of those means or the mean of those standard deviations:
CV_mean_row <- function(cost) {
means <- sapply(1:nrow(cost), function(x) mean(cost[x, ]))
return(sqrt(var(means))/mean(means))
}
CV_mean_col <- function(cost) {
means <- sapply(1:ncol(cost), function(x) mean(cost[, x]))
return(sqrt(var(means))/mean(means))
}
mean_CV_row <- function(cost) {
CV <- sapply(1:nrow(cost), function(x) sqrt(var(cost[x, ]))/mean(cost[x,
]))
return(mean(CV))
}
mean_CV_col <- function(cost) {
CV <- sapply(1:ncol(cost), function(x) sqrt(var(cost[, x]))/mean(cost[,
x]))
return(mean(CV))
}
sd_mean_row <- function(cost) {
means <- sapply(1:nrow(cost), function(x) mean(cost[x, ]))
return(sqrt(var(means)))
}
sd_mean_col <- function(cost) {
means <- sapply(1:ncol(cost), function(x) mean(cost[, x]))
return(sqrt(var(means)))
}
mean_sd_row <- function(cost) {
sd <- sapply(1:nrow(cost), function(x) sqrt(var(cost[x, ])))
return(mean(sd))
}
mean_sd_col <- function(cost) {
sd <- sapply(1:ncol(cost), function(x) sqrt(var(cost[, x])))
return(mean(sd))
}
R <- generate_uniform_matrix(1000, 100, 1, 0.1, 1)
CV_mean_row(R)
## [1] 0.1005
CV_mean_col(R)
## [1] 0.9093
mean_CV_row(R)
## [1] 0.9093
mean_CV_col(R)
## [1] 0.1005
sd_mean_row(R)
## [1] 0.09945
sd_mean_col(R)
## [1] 0.8994
mean_sd_row(R)
## [1] 0.8994
mean_sd_col(R)
## [1] 0.09945
It appears that the CV of the means is the same as the mean of the CVs. Also, the standard deviation of means is the same as the mean of standard deviations.
Let's keep the CV of the means as our final candidate measures (the standard deviation of the means seems equivalent but absolute instead of relative).
CV_mean_row(lolo)
## [1] 0.08649
CV_mean_col(lolo)
## [1] 0.0685
CV_mean_row(lohi)
## [1] 0.09584
CV_mean_col(lohi)
## [1] 0.7769
CV_mean_row(hihi)
## [1] 1.099
CV_mean_col(hihi)
## [1] 1.185
This is consistent with our base matrices.
MPH <- function(ETC) {
ECS <- 1/ETC
MP <- sapply(1:ncol(ECS), function(j) sum(ECS[, j]))
MP <- sort(MP)
result <- 0
for (j in 1:(ncol(ECS) - 1)) result <- result + MP[j]/MP[j + 1]
result <- result/(ncol(ECS) - 1)
return(result)
}
TDH <- function(ETC) {
ECS <- 1/ETC
TD <- sapply(1:nrow(ECS), function(i) sum(ECS[i, ]))
TD <- sort(TD)
result <- 0
for (i in 1:(nrow(ECS) - 1)) result <- result + TD[i]/TD[i + 1]
result <- result/(nrow(ECS) - 1)
return(result)
}
TDH(lolo)
## [1] 0.9656
MPH(lolo)
## [1] 0.9719
TDH(lohi)
## [1] 0.962
MPH(lohi)
## [1] 0.6838
TDH(hihi)
## [1] 0.6646
MPH(hihi)
## [1] 0.6503
Those measures are able to characterize the heterogeneity on small matrices.
TDH(R)
## [1] 0.9993
MPH(R)
## [1] 0.9445
However, for a large matrix generated with the same parameters those measures are useless. Indeed, asymptotically, the ratios are all close to one.
Let's see if the way the original range-based method was used allowed to reach several distinct settings in terms of heterogeneities.
heterogeneity_quadrant_measures <- function(Rtask, Rmach) {
rdistr <- function(n) {
return(runif(n, 1, Rtask))
}
rdistc <- function(n) {
return(runif(n, 1, Rmach))
}
Z <- generate_matrix_siegel(1000, 100, rdistr, rdistc)
return(list(CVmtask = CV_mean_row(Z), CVmmach = CV_mean_col(Z), MPH = MPH(Z),
TDH = TDH(Z)))
}
parameters <- rbind(c(10, 10), c(10, 100), c(100, 10), c(100, 1000), c(3000,
10), c(3000, 100), c(3000, 1000), c(10000, 10), c(10000, 100))
H <- sapply(1:nrow(parameters), function(x) heterogeneity_quadrant_measures(parameters[x,
1], parameters[x, 2]))
summary(sapply(data.frame(t(H)), unlist))
## CVmtask CVmmach MPH TDH
## Min. :0.478 Min. :0.0165 Min. :0.980 Min. :0.993
## 1st Qu.:0.552 1st Qu.:0.0171 1st Qu.:0.987 1st Qu.:0.993
## Median :0.568 Median :0.0189 Median :0.994 Median :0.995
## Mean :0.554 Mean :0.0194 Mean :0.992 Mean :0.995
## 3rd Qu.:0.581 3rd Qu.:0.0216 3rd Qu.:0.996 3rd Qu.:0.996
## Max. :0.585 Max. :0.0227 Max. :0.998 Max. :0.998
We see that the task and machine heterogeneities have always almost the same values. Let's consider the same test but with consistent cost matrices, which could actually impact those measures.
make_consistent <- function(mat, a = 1, b = 1) {
row_max <- round(a * nrow(mat))
col_max <- round(b * ncol(mat))
if (row_max == 0 || col_max == 0)
return(mat)
for (i in 1:row_max) mat[i, 1:col_max] <- sort(mat[i, 1:col_max])
mat
}
generate_matrix_siegel_consistent <- function(R, C, rdistr, rdistc) {
Z <- generate_matrix_siegel(R, C, rdistr, rdistc)
make_consistent(Z)
}
heterogeneity_quadrant_measures <- function(Rtask, Rmach) {
rdistr <- function(n) {
return(runif(n, 1, Rtask))
}
rdistc <- function(n) {
return(runif(n, 1, Rmach))
}
Z <- generate_matrix_siegel_consistent(1000, 100, rdistr, rdistc)
return(list(CVmtask = CV_mean_row(Z), CVmmach = CV_mean_col(Z), MPH = MPH(Z),
TDH = TDH(Z)))
}
parameters <- rbind(c(10, 10), c(10, 100), c(100, 10), c(100, 1000), c(3000,
10), c(3000, 100), c(3000, 1000), c(10000, 10), c(10000, 100))
H <- sapply(1:nrow(parameters), function(x) heterogeneity_quadrant_measures(parameters[x,
1], parameters[x, 2]))
summary(sapply(data.frame(t(H)), unlist))
## CVmtask CVmmach MPH TDH
## Min. :0.461 Min. :0.470 Min. :0.952 Min. :0.993
## 1st Qu.:0.550 1st Qu.:0.471 1st Qu.:0.961 1st Qu.:0.993
## Median :0.573 Median :0.564 Median :0.962 Median :0.994
## Mean :0.554 Mean :0.525 Mean :0.967 Mean :0.995
## 3rd Qu.:0.589 3rd Qu.:0.565 3rd Qu.:0.978 3rd Qu.:0.996
## Max. :0.601 Max. :0.577 Max. :0.978 Max. :0.997
This changes significantly the machine heterogeneity. This shows that the way the heterogeneity quadrant is used is deeply flawed. What is expected is that the range paramaters have an impact on the heterogeneity whereas it is the consistency that impact it. Overall, we may also conclude that the tested settings in terms of heterogeneity in the literature lacks diversity.
This study was a preliminary test to check the hypothesis that the quadrant heterogeneity was a fraud. So far, it is the case. More complete test must be done. Additionally, the CVB method may show different outcomes. At the very least, this shows that previous studies do not control what they claim to control.