Implementation and study of TMA (March 12, 2015)

We want to implement the TMA and to check that the result is consistent with the examples provided in the literature (we will also check the MPH and TDH when relevant). It should be noted that there are two TMA depending on the normalization procedure. We will then study the relation between the TMA and the correlation.

TMA implementations

TMA1 <- function(ETC) {
    ECS <- 1/ETC
    for (j in 1:ncol(ECS)) ECS[, j] <- ECS[, j]/sum(ECS[, j])
    sv <- sort(svd(ECS)$d, decreasing = TRUE)
    mean(sv[-1])/sv[1]
}

ETC <- 1/rbind(c(2, 1, 4), c(1, 3, 7), c(3, 4, 1), c(1, 1, 2))
TMA1(ETC)
## [1] 0.3039

This is consistent with alqawasmeh2010a.

Let's see the second method with the iterative procedure:

TMA2 <- function(ETC, prec = 1e-08, iter = 1000) {
    ECS <- 1/ETC
    converge <- FALSE
    for (k in 1:iter) {
        newECS <- ECS
        for (j in 1:ncol(ECS)) newECS[, j] <- sqrt(nrow(ECS)/ncol(ECS)) * newECS[, 
            j]/sum(newECS[, j])
        for (i in 1:nrow(ECS)) newECS[i, ] <- sqrt(ncol(ECS)/nrow(ECS)) * newECS[i, 
            ]/sum(newECS[i, ])
        for (j in 1:ncol(ECS)) if (sum(abs(newECS[, j] - ECS[, j])) < prec) 
            converge <- TRUE
        for (i in 1:nrow(ECS)) if (sum(abs(newECS[i, ] - ECS[i, ])) < prec) 
            converge <- TRUE
        ECS <- newECS
        if (converge) 
            break
    }
    sv <- sort(svd(ECS)$d, decreasing = TRUE)
    mean(sv[-1])/sv[1]
}

ETC <- 1/rbind(c(2, 1, 4), c(1, 3, 7), c(3, 4, 1), c(1, 1, 2))
TMA2(ETC)
## [1] 0.2706

TMA2, MPH and TDH verification

We will start with the simple examples of alqawasmeh2011b:

ETC <- 1/rbind(c(0, 100), c(1, 100))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 0.0050 0.9901 0.7298
ETC <- 1/rbind(c(1, 0), c(0, 100))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 0.01 0.01 1.00
ETC <- 1/rbind(c(1, 0), c(0, 1))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 1 1 1
ETC <- 1/rbind(c(0, 1), c(100, 100))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 0.9901 0.0050 0.3845
ETC <- 1/rbind(c(1, 100), c(1, 100))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 0.01 1.00 0.00
ETC <- 1/rbind(c(1, 100), c(100, 10000))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 0.01 0.01 0.00
ETC <- 1/rbind(c(1, 1), c(1, 1))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 1 1 0
ETC <- 1/rbind(c(1, 1), c(100, 100))
c(MPH(ETC), TDH(ETC), TMA2(ETC))
## [1] 1.00 0.01 0.00

The first and the fourth examples do not work as expected. The difference for the MPH and the TDH is definitively an error when we compare to the other examples. The TMA is not as high because the stopping criterion is well-suited to those case with null values. Forcing the number of iterations to 100 would solve the problem, but the TMA was not presented this way. We will stick to the original mechanism.

ETC <- rbind(c(698, 1850), c(105, 345))
c(TDH(ETC), MPH(ETC), TMA2(ETC))
## [1] 0.15884 0.31389 0.05366
ETC <- rbind(c(859, 105), c(509, 1018))
c(TDH(ETC), MPH(ETC), TMA2(ETC))
## [1] 0.2757 0.2978 0.6036

Now, the largest examples:

ETC <- rbind(c(625, 974, 627, 563, 1159), c(971, 1592, 988, 956, 1363), c(642, 
    1166, 631, 626, 1026), c(267, 946, 266, 463, 416), c(639, 1124, 642, 719, 
    957), c(300, 1698, 205, 391, 920), c(768, 1756, 779, 912, 1235), c(537, 
    686, 548, 431, 523), c(1082, 1814, 1092, 1095, 1804), c(557, 1450, 569, 
    698, 1850), c(707, 1235, 714, 699, 1179), c(414, 758, 415, 414, 793))
c(TDH(ETC), MPH(ETC), TMA2(ETC))
## [1] 0.89821 0.82051 0.07357
ETC <- rbind(c(677, 1776, 664, 1238, 528), c(777, 2701, 778, 1267, 2522), c(845, 
    2210, 851, 1056, 1104), c(694, 1132, 706, 741, 1034), c(576, 838, 592, 481, 
    1061), c(859, 1204, 845, 105, 345), c(704, 1507, 711, 1207, 499), c(657, 
    979, 678, 648, 789), c(589, 893, 596, 518, 764), c(509, 1635, 520, 1018, 
    441), c(253, 465, 264, 319, 451), c(531, 954, 551, 471, 1227), c(855, 2431, 
    874, 1420, 1588), c(693, 1122, 724, 626, 1146), c(1102, 1348, 1150, 892, 
    940), c(964, 1349, 939, 858, 1621), c(1694, 2887, 1668, 1298, 3148))
c(TDH(ETC), MPH(ETC), TMA2(ETC))
## [1] 0.9082 0.8261 0.1323

We get the same result with the same number of iterations.

Comparison between TMA and correlation

Let's generate some matrices with the noise-based and the combination-based methods.

measures <- NULL
for (rhoR in seq(0.1, 0.9, 0.1)) for (rhoC in seq(0.1, 0.9, 0.1)) {
    mat <- generate_heterogeneous_matrix_noise_corr(100, 100, rgamma_cost, rhoR, 
        rhoC, 3)
    measures <- rbind(measures, data.frame(mean_cor_row = mean_cor_row(mat), 
        mean_cor_col = mean_cor_col(mat), TMA1 = TMA1(mat), TMA2 = TMA2(mat), 
        method = "noise"))
    mat <- generate_matrix_corr(100, 100, rgamma_cost, 1, 1, rhoR, rhoC)
    measures <- rbind(measures, data.frame(mean_cor_row = mean_cor_row(mat), 
        mean_cor_col = mean_cor_col(mat), TMA1 = TMA1(mat), TMA2 = TMA2(mat), 
        method = "combination"))
}

Let's see the relation between the first TMA and the last one:

library(ggplot2)
p <- ggplot(data = measures, aes(x = TMA1, y = TMA2))
p <- p + geom_point()
p <- p + facet_grid(~method)
p

plot of chunk TMA_correlation_plot

They do not have a strong relation. The second TMA has often larger values and a better distribution. Let's see its relation with the correlation.

library(ggplot2)
p <- ggplot(data = measures, aes(x = mean_cor_row, y = mean_cor_col, color = log(TMA2)))
p <- p + geom_point(size = 4)
p <- p + scale_color_gradientn(colours = rainbow(20))
p <- p + facet_grid(~method)
p

plot of chunk unnamed-chunk-4

There is a small relation with the noise-based method but not with the combination-based one.

summary(measures$TMA2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0011  0.0186  0.0355  0.0955  0.0875  0.6580

Finally, most values are small and some go up to 0.7, which is also the maximum value in alqawasmeh2010a.

Conclusion

We implemented both versions of the TMA and validated the correctness of their implementations. The confrontation to the correlations reveals a small relation with the noise-based method. Let's measure it nonetheless in the experiments. We may study later which measure has the most impact on the heuristic performance (maybe with a simple linear regression model).