Another extension of the range-based and CVB methods is proposed in khemka2013a for increasing or decreasing the TMA. More generally, we may try to generate matrices for given singular values. This actually easy (see http://math.stackexchange.com/q/530445: from a given matrix, we perform the SVD to get U Sigma V'; then, we replace Sigma by the expected singular values and perform the product. The problem is that this does not guarantee costs to be positive. Khemka avoids the problem with an iterative method, which poses two issues: the lack of analytically control and the limited range of values (from 0.14 to 0.18, whereas the maximum TMA can be 1).
We investigate here another method that is based on a combination of base positive matrices for which we know the singular values.
generate_matrix_sv <- function(sv, n, m) {
base <- list(matrix(1, n, m)/sqrt(n * m))
for (i in 1:(min(n, m) - 1)) {
M <- diag(rep(1, i))
for (j in 1:(m - i)) M <- cbind(M, 0)
M <- rbind(M, t(replicate(n - i, c(rep(0, i), rep(1, m - i)/sqrt((n -
i) * (m - i))))))
M <- M[sample(1:nrow(M)), ][, sample(1:ncol(M))]
base <- c(base, list(M))
}
result <- matrix(0, n, m)
for (i in length(sv):1) {
result <- result + sv[i] * base[[i]]
sv <- sv - sv[i]
}
result
}
sv <- c(10, 1, 1, 1, 0.1, 0.1, 0.1, 0.1, 0, 0)
M <- generate_matrix_sv(sv, 10, 10)
svd(M)$d
## [1] 1.000e+01 1.000e+00 9.093e-01 8.795e-01 1.000e-01 1.000e-01 8.960e-02
## [8] 1.422e-02 2.385e-17 1.646e-18
The method is based on the observation that the singular values of a linear combination of matrices is similar to the same linear combination of their singular values. This is however not an equality and it is even less true for non-square matrices.
M <- generate_matrix_sv(sv, 30, 10)
svd(M)$d
## [1] 9.964e+00 9.387e-01 8.902e-01 8.545e-01 1.000e-01 1.000e-01 1.000e-01
## [8] 8.960e-02 1.975e-16 0.000e+00
The obtained SV seem to be always lower (this has yet to be proved, see http://math.stackexchange.com/q/569989). We can fix this issue by performing the SVD, adjusting the SV, and then performing the product again, but we may get negative costs and this would complicate the formal analysis. Moreover, the difference is not significant.
table(M)
## M
## 0.519615242270663 0.53165382757924 0.585080609341461 0.597119194650038
## 99 6 122 63
## 0.619615242270663 0.685080609341461 1.41961524227066
## 3 4 3
mean(M)
## [1] 0.575
sqrt(var(as.vector(M)))/mean(M)
## [1] 0.1607
Another important observation is related to the distribution of the values: there are only a few distinct values for 300 costs. This may be an issue, but it is already better with the random reordering of the matrix. Also, we have only a low control over the mean and none over the CV of the costs: if we want to get a specific vector of SV, then we cannot change anything; if we are only interested in the CV of the SV, then we can multiply all the matrix by a constant, which will impact the mean (but not the CV of the costs).
Finally, we could apply an orthogonal similarity transformation, which should not change the SV (see “Numerically Stable Generation of Correlation Matrices and Their Factors”, 2000), but we would still have to deal with negative values.
Let's study the properties of the resulting matrix in terms of SV and correlations. First, let's check that each singular value is lower than the linear combination of the base singular values.
V <- NULL
for (i in 1:1000) {
M1 <- rbind(c(runif(2)), c(runif(2)))
M2 <- rbind(c(runif(2)), c(runif(2)))
V <- rbind(V, c(svd(M1 + M2)$d[1] - svd(M1)$d[1] - svd(M2)$d[1], svd(M1 +
M2)$d[2] - svd(M1)$d[2] - svd(M2)$d[2]))
}
summary(V)
## V1 V2
## Min. :-0.5216 Min. :-1.2697
## 1st Qu.:-0.1379 1st Qu.:-0.1896
## Median :-0.0672 Median :-0.0441
## Mean :-0.0951 Mean :-0.1103
## 3rd Qu.:-0.0298 3rd Qu.: 0.0115
## Max. :-0.0001 Max. : 0.5172
This is the case only for the first one. “Singular value inequalities for matrix sums and minors”, 1975 confirms this and shows that the sum of the SV is lower than the linear combination of the sum of the base SV.
Let's check that the SV of the base matrices is correct. We have the following matrix:
A 0
0 B
where A is a diagonal with only ones and all values of B are the same (1 over the square root of the number of elements in B). I have no idea how to prove this and I give up. This would require some time reviewing and learning about the singular value decomposition.
Let's focus now on the correlation between the rows of the previous matrix. The correlation between the first rows related to A is negative (between -1 and 0). and it is 1 for the last rows. If the correlation between the first rows was exactly -1, then the row correlation of the i-th base matrix, for which A is a (i-1)-square matrix, would be
(-1*(i-1)*(i-2)/2^2-1*(i-1)*(n-i+1)+1*(n-i+1)*(n-i)/2) / (n*(n-1)/2)
2 * (n-i+1)*(n-i) / (n*(n-1)) - 1
And the row correlation of the final matrix would be
\Sigma_{i=1}^{min(m,n)} (sigma(i)-sigma(i+1)) * (n-i+1)*(n-i) / (n*(n-1))
where sigma(i) is the i-th expected singular value (sigma(min(m,n)+1)=0). This is however not the case. Although we could derive it, the previous derivations show that it would be too much complex to deal with (especially given that we will eventually provide a distribution of SV with given properties and not specific values).
As we are not able to say anything elegant about this method, we need at least to show it will works in the experimentation. We can already expect the row and column correlations to be similar. This is actually incorrect because the matrix is not square.
Let's finalize the method by providing only a CV for the SV and a final mean (we will check that the TMA, the CVSV and the correlations are not impacted by the final scaling).
generate_matrix_cvsv <- function(n, m, rdist, cvsv, mu) {
SV <- sort(rdist(min(n, m), 1, cvsv), decreasing = TRUE)
M <- generate_matrix_sv(SV, n, m)
M/mean(M) * mu
}
measures <- NULL
for (cvsv in c(0.01, 0.02, 0.03, 0.05, 0.1, 0.2, 0.3, 0.5, 1, 2, 3, 5, 10)) {
M <- generate_matrix_cvsv(300, 100, rgamma_cost, cvsv, 10)
measures <- rbind(measures, data.frame(cvsv, CV_SV(M), TMA2(M), mean_cor_row(M),
mean_cor_col(M)))
}
Let's plot those values:
plot(measures$cvsv, measures$cvsv, log = "xy", type = "l", ylim = c(1e-04, 10))
lines(measures$cvsv, measures$CV_SV.M., type = "b")
lines(measures$cvsv, measures$TMA2.M., type = "b", col = 2)
lines(measures$cvsv, measures$mean_cor_row.M., type = "b", col = 3)
lines(measures$cvsv, measures$mean_cor_col.M., type = "b", col = 4)
The intuition is that low CV of singular values means that machines are specialized, whereas high values is closer to the uniform case. For high values, we have indeed both correlations that increase. For low values, however, the column correlation is indeed low (as expected for specialized machines), but the row correlation is quite high. This needs to be investigated. Finally, the TMA2 does not follow the same tendency, except for the slow decrease. We will try to see if we can control the value of this property.
First the row correlation:
(SV <- sort(rgamma_cost(3, 1, 0.001), decreasing = TRUE))
## [1] 1.0003 1.0003 0.9999
M <- generate_matrix_sv(SV, 10, 3)
mean_cor_row(M)
## [1] 0.4333
table(sapply(1:10, function(i) {
sapply(1:10, function(j) {
correlation(M[i, ], M[j, ])
})
}))
##
## -0.500757871005642 -0.500194764602182 -0.500068868358112
## 2 14 2
## -0.499736329645035 -0.499172839591866 0.999999304072743
## 14 2 14
## 1
## 52
The row correlations is indeed high because several lines are equals. But why is it not the case when the CVSV is close to 1?
SV <- sort(rgamma_cost(30, 1, 1), decreasing = TRUE)
plot.ecdf(SV)
M <- generate_matrix_sv(SV, 100, 30)
mean_cor_row(M)
## [1] 0.04891
This time, we have distinct rows. The inflection in the previous curve is thus normal.
Let's see whether we can generate matrices with a given TMA:
for (TMA in c(0.01, 0.03, 0.1, 0.2, 0.3, 0.5, 0.9)) {
SV <- c(1, rep(TMA, 29))
M <- generate_matrix_sv(SV, 100, 30)
print(c(TMA, TMA2(M), TMA2(1/M), TMA1(1/M), length(table(1/M))))
}
## [1] 0.010000 0.006562 0.009816 0.009928 3.000000
## [1] 0.03000 0.01187 0.02901 0.03001 3.00000
## [1] 0.10000 0.01714 0.09171 0.10269 3.00000
## [1] 0.20000 0.01971 0.17040 0.21244 3.00000
## [1] 0.30000 0.02125 0.23825 0.32724 3.00000
## [1] 0.50000 0.02338 0.34882 0.55745 3.00000
## [1] 0.90000 0.02619 0.50352 0.91295 3.00000
We can produce matrices with given specific TMA by inverting the output of the current method. The value is not exactly the correct one due to the normalization done by the TMA measure (the first version, which alters less the matrix, gives closer results). Those matrices are however useless because they have only three distinct values and I suspect they represent simple instances. Let's see if we can insert some noise:
M <- generate_matrix_TMA(100, 30, 0.2, 10)
c(TMA2(M), length(table(M)))
## [1] 6.578e-02 1.820e+03
This is better.
Let's see if the current method is promising to predict the performance of one heuristic:
best_cmax <- function(costs) {
min(LPT_unrelated(costs, min), balance_sufferage(costs), balance_EFT(costs))
}
LPT_unrelated_perf <- NULL
for (i in 1:100) {
cvsv <- 10^runif(1, -2, 1)
Z <- generate_matrix_cvsv(200, 50, rgamma_cost, cvsv, 1)
cmax <- LPT_unrelated(Z, func = min)
LPT_unrelated_perf <- rbind(LPT_unrelated_perf, data.frame(cvsv = cvsv,
mean_cor_row = mean_cor_row(Z), mean_cor_col = mean_cor_col(Z), TMA1 = TMA1(Z),
TMA2 = TMA2(Z), CVSV = CV_SV(Z), ratio = max(1, cmax/best_cmax(Z))))
}
Before analyzing the data, let's visualize them:
library(ggplot2)
p <- ggplot(data = LPT_unrelated_perf, aes(x = cvsv, y = ratio))
p <- p + geom_point() + geom_smooth()
p <- p + scale_x_log10()
p
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
This seems to be a good predictor too. The most important variation occurs when the CVSV is between 0.2 and 2.
Let's see with the linear regressions:
summary(lm(ratio ~ cvsv, LPT_unrelated_perf))$adj.r.squared
## [1] 0.4155
summary(lm(ratio ~ mean_cor_row, LPT_unrelated_perf))$adj.r.squared
## [1] -0.009396
summary(lm(ratio ~ mean_cor_col, LPT_unrelated_perf))$adj.r.squared
## [1] 0.1915
summary(lm(ratio ~ TMA1, LPT_unrelated_perf))$adj.r.squared
## [1] 0.671
summary(lm(ratio ~ TMA2, LPT_unrelated_perf))$adj.r.squared
## [1] 0.6697
summary(lm(ratio ~ CVSV, LPT_unrelated_perf))$adj.r.squared
## [1] 0.524
It is strange that the TMA is so good. It is because it put emphasis on the maximum SV or because it considers the inverse of the costs?
Let's do the same for the TMA:
LPT_unrelated_perf <- NULL
for (i in 1:100) {
TMA <- 10^runif(1, -2, 0)
Z <- generate_matrix_TMA(200, 50, TMA, 1)
cmax <- LPT_unrelated(Z, func = min)
LPT_unrelated_perf <- rbind(LPT_unrelated_perf, data.frame(TMA = TMA, mean_cor_row = mean_cor_row(Z),
mean_cor_col = mean_cor_col(Z), TMA1 = TMA1(Z), TMA2 = TMA2(Z), CVSV = CV_SV(Z),
ratio = max(1, cmax/best_cmax(Z))))
}
And the visualization and the regressions:
library(ggplot2)
p <- ggplot(data = LPT_unrelated_perf, aes(x = TMA, y = ratio))
p <- p + geom_point() + geom_smooth()
p <- p + scale_x_log10()
p
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
summary(lm(ratio ~ TMA, LPT_unrelated_perf))$adj.r.squared
## [1] 0.0497
summary(lm(ratio ~ mean_cor_row, LPT_unrelated_perf))$adj.r.squared
## [1] 0.03695
summary(lm(ratio ~ mean_cor_col, LPT_unrelated_perf))$adj.r.squared
## [1] -0.01018
summary(lm(ratio ~ TMA1, LPT_unrelated_perf))$adj.r.squared
## [1] 0.02121
summary(lm(ratio ~ TMA2, LPT_unrelated_perf))$adj.r.squared
## [1] 0.03636
summary(lm(ratio ~ CVSV, LPT_unrelated_perf))$adj.r.squared
## [1] 0.2515
No measure is a good predictor here, even though there is a nice association for TMA lower than 0.1. For larger values, the relation is more chaotic. This behavior is not yet completely clear. Would this mean there is an issue with the generation method or does this mean that the ratio between the maximum SV and the mean of the others is relevant only when it is significant. For instance, we could imagine a measure that considers only the first two SV when their difference is large, otherwise it would integrate the next SV and do this until the first selected SV are large enough. An experiment to validate this: use a bimodal distribution for the SV (they would be either large or low). We expect that the number of SV in the large set would be a good predictor. But this number is too coarse: it cannot distinguish situations where the TMA is 0.01 from ones where the TMA is 0.1.
On the other hand, the current result may be sufficient. The LPT achieves its worst performance when the TMA is greater than 0.1. Is it necessary to be finer in this region? There are still ratios that are quite good while others are extremely bad. We are thus still missing something.
We have proposed a new method for generating cost matrices. This one allows to fix the singular values. This is only a heuristic without any guarantee, but this still usable and the best we got. We can either give a CV for the SV, or a TMA (results may be quite imprecise with this last method).
In terms of prediction, there is no denying that the SV are really useful. The relative success of the TMA is still not well understood. There are still open questions:
The objective is to take from it the good parts and to incorporate them in a more elegant solution. The current problem is that there are many other measures that are varying at the same time. Let's take a break from this, review properties of matrices used in the literature and come back to this with fresher ideas. Moreover, studying how to integrate extreme values in measures could lead to interesting insight.