We want to implement the TMA and to check that the result is consistent with the examples provided in the literature (we will also check the MPH and TDH when relevant). It should be noted that there are two TMA depending on the normalization procedure. We will then study the relation between the TMA and the correlation.
TMA1 <- function(ETC) {
ECS <- 1/ETC
for (j in 1:ncol(ECS)) ECS[, j] <- ECS[, j]/sum(ECS[, j])
sv <- sort(svd(ECS)$d, decreasing = TRUE)
mean(sv[-1])/sv[1]
}
ETC <- 1/rbind(c(2, 1, 4), c(1, 3, 7), c(3, 4, 1), c(1, 1, 2))
TMA1(ETC)
## [1] 0.3039
This is consistent with alqawasmeh2010a.
Let's see the second method with the iterative procedure:
TMA2 <- function(ETC, prec = 1e-08, iter = 1000) {
ECS <- 1/ETC
converge <- FALSE
for (k in 1:iter) {
newECS <- ECS
for (j in 1:ncol(ECS)) newECS[, j] <- sqrt(nrow(ECS)/ncol(ECS)) * newECS[,
j]/sum(newECS[, j])
for (i in 1:nrow(ECS)) newECS[i, ] <- sqrt(ncol(ECS)/nrow(ECS)) * newECS[i,
]/sum(newECS[i, ])
for (j in 1:ncol(ECS)) if (sum(abs(newECS[, j] - ECS[, j])) < prec)
converge <- TRUE
for (i in 1:nrow(ECS)) if (sum(abs(newECS[i, ] - ECS[i, ])) < prec)
converge <- TRUE
ECS <- newECS
if (converge)
break
}
sv <- sort(svd(ECS)$d, decreasing = TRUE)
mean(sv[-1])/sv[1]
}
ETC <- 1/rbind(c(2, 1, 4), c(1, 3, 7), c(3, 4, 1), c(1, 1, 2))
TMA2(ETC)
## [1] 0.2706
We will start with the simple examples of alqawasmeh2011b:
ETC <- 1/rbind(c(0, 100), c(1, 100))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 0.0050 0.9901 0.7298
ETC <- 1/rbind(c(1, 0), c(0, 100))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 0.01 0.01 1.00
ETC <- 1/rbind(c(1, 0), c(0, 1))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 1 1 1
ETC <- 1/rbind(c(0, 1), c(100, 100))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 0.9901 0.0050 0.3845
ETC <- 1/rbind(c(1, 100), c(1, 100))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 0.01 1.00 0.00
ETC <- 1/rbind(c(1, 100), c(100, 10000))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 0.01 0.01 0.00
ETC <- 1/rbind(c(1, 1), c(1, 1))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 1 1 0
ETC <- 1/rbind(c(1, 1), c(100, 100))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 1.00 0.01 0.00
The first and the fourth examples do not work as expected. The difference for the MPH and the TDH is definitively an error when we compare to the other examples. The TMA is not as high because the stopping criterion is well-suited to those case with null values. Forcing the number of iterations to 100 would solve the problem, but the TMA was not presented this way. We will stick to the original mechanism.
ETC <- rbind(c(698, 1850), c(105, 345))
c(TDH(ETC), MPH(ETC), TMA2(ETC))
## [1] 0.15884 0.31389 0.05366
ETC <- rbind(c(859, 105), c(509, 1018))
c(TDH(ETC), MPH(ETC), TMA2(ETC))
## [1] 0.2757 0.2978 0.6036
Now, the largest examples:
ETC <- rbind(c(625, 974, 627, 563, 1159), c(971, 1592, 988, 956, 1363), c(642,
1166, 631, 626, 1026), c(267, 946, 266, 463, 416), c(639, 1124, 642, 719,
957), c(300, 1698, 205, 391, 920), c(768, 1756, 779, 912, 1235), c(537,
686, 548, 431, 523), c(1082, 1814, 1092, 1095, 1804), c(557, 1450, 569,
698, 1850), c(707, 1235, 714, 699, 1179), c(414, 758, 415, 414, 793))
c(TDH(ETC), MPH(ETC), TMA2(ETC))
## [1] 0.89821 0.82051 0.07357
ETC <- rbind(c(677, 1776, 664, 1238, 528), c(777, 2701, 778, 1267, 2522), c(845,
2210, 851, 1056, 1104), c(694, 1132, 706, 741, 1034), c(576, 838, 592, 481,
1061), c(859, 1204, 845, 105, 345), c(704, 1507, 711, 1207, 499), c(657,
979, 678, 648, 789), c(589, 893, 596, 518, 764), c(509, 1635, 520, 1018,
441), c(253, 465, 264, 319, 451), c(531, 954, 551, 471, 1227), c(855, 2431,
874, 1420, 1588), c(693, 1122, 724, 626, 1146), c(1102, 1348, 1150, 892,
940), c(964, 1349, 939, 858, 1621), c(1694, 2887, 1668, 1298, 3148))
c(TDH(ETC), MPH(ETC), TMA2(ETC))
## [1] 0.9082 0.8261 0.1323
We get the same result with the same number of iterations.
Let's generate some matrices with the noise-based and the combination-based methods.
measures <- NULL
for (rhoR in seq(0.1, 0.9, 0.1)) for (rhoC in seq(0.1, 0.9, 0.1)) {
mat <- generate_heterogeneous_matrix_noise_corr(100, 100, rgamma_cost, rhoR,
rhoC, 3)
measures <- rbind(measures, data.frame(mean_cor_row = mean_cor_row(mat),
mean_cor_col = mean_cor_col(mat), TMA1 = TMA1(mat), TMA2 = TMA2(mat),
method = "noise"))
mat <- generate_matrix_corr(100, 100, rgamma_cost, 1, 1, rhoR, rhoC)
measures <- rbind(measures, data.frame(mean_cor_row = mean_cor_row(mat),
mean_cor_col = mean_cor_col(mat), TMA1 = TMA1(mat), TMA2 = TMA2(mat),
method = "combination"))
}
Let's see the relation between the first TMA and the last one:
library(ggplot2)
p <- ggplot(data = measures, aes(x = TMA1, y = TMA2))
p <- p + geom_point()
p <- p + facet_grid(~method)
p
They do not have a strong relation. The second TMA has often larger values and a better distribution. Let's see its relation with the correlation.
library(ggplot2)
p <- ggplot(data = measures, aes(x = mean_cor_row, y = mean_cor_col, color = log(TMA2)))
p <- p + geom_point(size = 4)
p <- p + scale_color_gradientn(colours = rainbow(20))
p <- p + facet_grid(~method)
p
There is a small relation with the noise-based method but not with the combination-based one.
summary(measures$TMA2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0011 0.0186 0.0355 0.0955 0.0875 0.6580
Finally, most values are small and some go up to 0.7, which is also the maximum value in alqawasmeh2010a.
We implemented both versions of the TMA and validated the correctness of their implementations. The confrontation to the correlations reveals a small relation with the noise-based method. Let's measure it nonetheless in the experiments. We may study later which measure has the most impact on the heuristic performance (maybe with a simple linear regression model).