Generating cost matrices with given singular values (March 24 and 25, 2015)

Another extension of the range-based and CVB methods is proposed in khemka2013a for increasing or decreasing the TMA. More generally, we may try to generate matrices for given singular values. This actually easy (see http://math.stackexchange.com/q/530445: from a given matrix, we perform the SVD to get U Sigma V'; then, we replace Sigma by the expected singular values and perform the product. The problem is that this does not guarantee costs to be positive. Khemka avoids the problem with an iterative method, which poses two issues: the lack of analytically control and the limited range of values (from 0.14 to 0.18, whereas the maximum TMA can be 1).

Combination of positive matrices with known SV

We investigate here another method that is based on a combination of base positive matrices for which we know the singular values.

generate_matrix_sv <- function(sv, n, m) {
    base <- list(matrix(1, n, m)/sqrt(n * m))
    for (i in 1:(min(n, m) - 1)) {
        M <- diag(rep(1, i))
        for (j in 1:(m - i)) M <- cbind(M, 0)
        M <- rbind(M, t(replicate(n - i, c(rep(0, i), rep(1, m - i)/sqrt((n - 
            i) * (m - i))))))
        M <- M[sample(1:nrow(M)), ][, sample(1:ncol(M))]
        base <- c(base, list(M))
    }
    result <- matrix(0, n, m)
    for (i in length(sv):1) {
        result <- result + sv[i] * base[[i]]
        sv <- sv - sv[i]
    }
    result
}

sv <- c(10, 1, 1, 1, 0.1, 0.1, 0.1, 0.1, 0, 0)
M <- generate_matrix_sv(sv, 10, 10)
svd(M)$d

##  [1] 1.000e+01 1.000e+00 9.093e-01 8.795e-01 1.000e-01 1.000e-01 8.960e-02
##  [8] 1.422e-02 2.385e-17 1.646e-18

The method is based on the observation that the singular values of a linear combination of matrices is similar to the same linear combination of their singular values. This is however not an equality and it is even less true for non-square matrices.

M <- generate_matrix_sv(sv, 30, 10)
svd(M)$d

##  [1] 9.964e+00 9.387e-01 8.902e-01 8.545e-01 1.000e-01 1.000e-01 1.000e-01
##  [8] 8.960e-02 1.975e-16 0.000e+00

The obtained SV seem to be always lower (this has yet to be proved, see http://math.stackexchange.com/q/569989). We can fix this issue by performing the SVD, adjusting the SV, and then performing the product again, but we may get negative costs and this would complicate the formal analysis. Moreover, the difference is not significant.

table(M)

## M
## 0.519615242270663  0.53165382757924 0.585080609341461 0.597119194650038 
##                99                 6               122                63 
## 0.619615242270663 0.685080609341461  1.41961524227066 
##                 3                 4                 3

mean(M)

## [1] 0.575

sqrt(var(as.vector(M)))/mean(M)

## [1] 0.1607

Another important observation is related to the distribution of the values: there are only a few distinct values for 300 costs. This may be an issue, but it is already better with the random reordering of the matrix. Also, we have only a low control over the mean and none over the CV of the costs: if we want to get a specific vector of SV, then we cannot change anything; if we are only interested in the CV of the SV, then we can multiply all the matrix by a constant, which will impact the mean (but not the CV of the costs).

Finally, we could apply an orthogonal similarity transformation, which should not change the SV (see “Numerically Stable Generation of Correlation Matrices and Their Factors”, 2000), but we would still have to deal with negative values.

Formal analysis

Let's study the properties of the resulting matrix in terms of SV and correlations. First, let's check that each singular value is lower than the linear combination of the base singular values.

V <- NULL
for (i in 1:1000) {
    M1 <- rbind(c(runif(2)), c(runif(2)))
    M2 <- rbind(c(runif(2)), c(runif(2)))
    V <- rbind(V, c(svd(M1 + M2)$d[1] - svd(M1)$d[1] - svd(M2)$d[1], svd(M1 + 
        M2)$d[2] - svd(M1)$d[2] - svd(M2)$d[2]))
}
summary(V)

##        V1                V2         
##  Min.   :-0.5216   Min.   :-1.2697  
##  1st Qu.:-0.1379   1st Qu.:-0.1896  
##  Median :-0.0672   Median :-0.0441  
##  Mean   :-0.0951   Mean   :-0.1103  
##  3rd Qu.:-0.0298   3rd Qu.: 0.0115  
##  Max.   :-0.0001   Max.   : 0.5172

This is the case only for the first one. “Singular value inequalities for matrix sums and minors”, 1975 confirms this and shows that the sum of the SV is lower than the linear combination of the sum of the base SV.

Let's check that the SV of the base matrices is correct. We have the following matrix:

A 0
0 B

where A is a diagonal with only ones and all values of B are the same (1 over the square root of the number of elements in B). I have no idea how to prove this and I give up. This would require some time reviewing and learning about the singular value decomposition.

Let's focus now on the correlation between the rows of the previous matrix. The correlation between the first rows related to A is negative (between -1 and 0). and it is 1 for the last rows. If the correlation between the first rows was exactly -1, then the row correlation of the i-th base matrix, for which A is a (i-1)-square matrix, would be

(-1*(i-1)*(i-2)/2^2-1*(i-1)*(n-i+1)+1*(n-i+1)*(n-i)/2) / (n*(n-1)/2)
2 * (n-i+1)*(n-i) / (n*(n-1)) - 1

And the row correlation of the final matrix would be

\Sigma_{i=1}^{min(m,n)} (sigma(i)-sigma(i+1)) * (n-i+1)*(n-i) / (n*(n-1))

where sigma(i) is the i-th expected singular value (sigma(min(m,n)+1)=0). This is however not the case. Although we could derive it, the previous derivations show that it would be too much complex to deal with (especially given that we will eventually provide a distribution of SV with given properties and not specific values).

Polishing

As we are not able to say anything elegant about this method, we need at least to show it will works in the experimentation. We can already expect the row and column correlations to be similar. This is actually incorrect because the matrix is not square.

Let's finalize the method by providing only a CV for the SV and a final mean (we will check that the TMA, the CVSV and the correlations are not impacted by the final scaling).

generate_matrix_cvsv <- function(n, m, rdist, cvsv, mu) {
    SV <- sort(rdist(min(n, m), 1, cvsv), decreasing = TRUE)
    M <- generate_matrix_sv(SV, n, m)
    M/mean(M) * mu
}

measures <- NULL
for (cvsv in c(0.01, 0.02, 0.03, 0.05, 0.1, 0.2, 0.3, 0.5, 1, 2, 3, 5, 10)) {
    M <- generate_matrix_cvsv(300, 100, rgamma_cost, cvsv, 10)
    measures <- rbind(measures, data.frame(cvsv, CV_SV(M), TMA2(M), mean_cor_row(M), 
        mean_cor_col(M)))
}

Let's plot those values:

plot(measures$cvsv, measures$cvsv, log = "xy", type = "l", ylim = c(1e-04, 10))
lines(measures$cvsv, measures$CV_SV.M., type = "b")
lines(measures$cvsv, measures$TMA2.M., type = "b", col = 2)
lines(measures$cvsv, measures$mean_cor_row.M., type = "b", col = 3)
lines(measures$cvsv, measures$mean_cor_col.M., type = "b", col = 4)

plot of chunk unnamed-chunk-5

The intuition is that low CV of singular values means that machines are specialized, whereas high values is closer to the uniform case. For high values, we have indeed both correlations that increase. For low values, however, the column correlation is indeed low (as expected for specialized machines), but the row correlation is quite high. This needs to be investigated. Finally, the TMA2 does not follow the same tendency, except for the slow decrease. We will try to see if we can control the value of this property.

First the row correlation:

(SV <- sort(rgamma_cost(3, 1, 0.001), decreasing = TRUE))

## [1] 1.0003 1.0003 0.9999

M <- generate_matrix_sv(SV, 10, 3)
mean_cor_row(M)

## [1] 0.4333

table(sapply(1:10, function(i) {
    sapply(1:10, function(j) {
        correlation(M[i, ], M[j, ])
    })
}))

## 
## -0.500757871005642 -0.500194764602182 -0.500068868358112 
##                  2                 14                  2 
## -0.499736329645035 -0.499172839591866  0.999999304072743 
##                 14                  2                 14 
##                  1 
##                 52

The row correlations is indeed high because several lines are equals. But why is it not the case when the CVSV is close to 1?

SV <- sort(rgamma_cost(30, 1, 1), decreasing = TRUE)
plot.ecdf(SV)

plot of chunk unnamed-chunk-7

M <- generate_matrix_sv(SV, 100, 30)
mean_cor_row(M)

## [1] 0.04891

This time, we have distinct rows. The inflection in the previous curve is thus normal.

Generating matrices with given TMA

Let's see whether we can generate matrices with a given TMA:

for (TMA in c(0.01, 0.03, 0.1, 0.2, 0.3, 0.5, 0.9)) {
    SV <- c(1, rep(TMA, 29))
    M <- generate_matrix_sv(SV, 100, 30)
    print(c(TMA, TMA2(M), TMA2(1/M), TMA1(1/M), length(table(1/M))))
}

## [1] 0.010000 0.006562 0.009816 0.009928 3.000000
## [1] 0.03000 0.01187 0.02901 0.03001 3.00000
## [1] 0.10000 0.01714 0.09171 0.10269 3.00000
## [1] 0.20000 0.01971 0.17040 0.21244 3.00000
## [1] 0.30000 0.02125 0.23825 0.32724 3.00000
## [1] 0.50000 0.02338 0.34882 0.55745 3.00000
## [1] 0.90000 0.02619 0.50352 0.91295 3.00000

We can produce matrices with given specific TMA by inverting the output of the current method. The value is not exactly the correct one due to the normalization done by the TMA measure (the first version, which alters less the matrix, gives closer results). Those matrices are however useless because they have only three distinct values and I suspect they represent simple instances. Let's see if we can insert some noise:

M <- generate_matrix_TMA(100, 30, 0.2, 10)
c(TMA2(M), length(table(M)))

## [1] 6.578e-02 1.820e+03

This is better.

Effect of the CV of the SV

Let's see if the current method is promising to predict the performance of one heuristic:

best_cmax <- function(costs) {
    min(LPT_unrelated(costs, min), balance_sufferage(costs), balance_EFT(costs))
}

LPT_unrelated_perf <- NULL
for (i in 1:100) {
    cvsv <- 10^runif(1, -2, 1)
    Z <- generate_matrix_cvsv(200, 50, rgamma_cost, cvsv, 1)
    cmax <- LPT_unrelated(Z, func = min)
    LPT_unrelated_perf <- rbind(LPT_unrelated_perf, data.frame(cvsv = cvsv, 
        mean_cor_row = mean_cor_row(Z), mean_cor_col = mean_cor_col(Z), TMA1 = TMA1(Z), 
        TMA2 = TMA2(Z), CVSV = CV_SV(Z), ratio = max(1, cmax/best_cmax(Z))))
}

Before analyzing the data, let's visualize them:

library(ggplot2)
p <- ggplot(data = LPT_unrelated_perf, aes(x = cvsv, y = ratio))
p <- p + geom_point() + geom_smooth()
p <- p + scale_x_log10()
p

## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

plot of chunk unnamed-chunk-10

This seems to be a good predictor too. The most important variation occurs when the CVSV is between 0.2 and 2.

Let's see with the linear regressions:

summary(lm(ratio ~ cvsv, LPT_unrelated_perf))$adj.r.squared

## [1] 0.4155

summary(lm(ratio ~ mean_cor_row, LPT_unrelated_perf))$adj.r.squared

## [1] -0.009396

summary(lm(ratio ~ mean_cor_col, LPT_unrelated_perf))$adj.r.squared

## [1] 0.1915

summary(lm(ratio ~ TMA1, LPT_unrelated_perf))$adj.r.squared

## [1] 0.671

summary(lm(ratio ~ TMA2, LPT_unrelated_perf))$adj.r.squared

## [1] 0.6697

summary(lm(ratio ~ CVSV, LPT_unrelated_perf))$adj.r.squared

## [1] 0.524

It is strange that the TMA is so good. It is because it put emphasis on the maximum SV or because it considers the inverse of the costs?

Effect of the TMA

Let's do the same for the TMA:

LPT_unrelated_perf <- NULL
for (i in 1:100) {
    TMA <- 10^runif(1, -2, 0)
    Z <- generate_matrix_TMA(200, 50, TMA, 1)
    cmax <- LPT_unrelated(Z, func = min)
    LPT_unrelated_perf <- rbind(LPT_unrelated_perf, data.frame(TMA = TMA, mean_cor_row = mean_cor_row(Z), 
        mean_cor_col = mean_cor_col(Z), TMA1 = TMA1(Z), TMA2 = TMA2(Z), CVSV = CV_SV(Z), 
        ratio = max(1, cmax/best_cmax(Z))))
}

And the visualization and the regressions:

library(ggplot2)
p <- ggplot(data = LPT_unrelated_perf, aes(x = TMA, y = ratio))
p <- p + geom_point() + geom_smooth()
p <- p + scale_x_log10()
p

## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

plot of chunk unnamed-chunk-12

summary(lm(ratio ~ TMA, LPT_unrelated_perf))$adj.r.squared

## [1] 0.0497

summary(lm(ratio ~ mean_cor_row, LPT_unrelated_perf))$adj.r.squared

## [1] 0.03695

summary(lm(ratio ~ mean_cor_col, LPT_unrelated_perf))$adj.r.squared

## [1] -0.01018

summary(lm(ratio ~ TMA1, LPT_unrelated_perf))$adj.r.squared

## [1] 0.02121

summary(lm(ratio ~ TMA2, LPT_unrelated_perf))$adj.r.squared

## [1] 0.03636

summary(lm(ratio ~ CVSV, LPT_unrelated_perf))$adj.r.squared

## [1] 0.2515

No measure is a good predictor here, even though there is a nice association for TMA lower than 0.1. For larger values, the relation is more chaotic. This behavior is not yet completely clear. Would this mean there is an issue with the generation method or does this mean that the ratio between the maximum SV and the mean of the others is relevant only when it is significant. For instance, we could imagine a measure that considers only the first two SV when their difference is large, otherwise it would integrate the next SV and do this until the first selected SV are large enough. An experiment to validate this: use a bimodal distribution for the SV (they would be either large or low). We expect that the number of SV in the large set would be a good predictor. But this number is too coarse: it cannot distinguish situations where the TMA is 0.01 from ones where the TMA is 0.1.

On the other hand, the current result may be sufficient. The LPT achieves its worst performance when the TMA is greater than 0.1. Is it necessary to be finer in this region? There are still ratios that are quite good while others are extremely bad. We are thus still missing something.

Conclusion

We have proposed a new method for generating cost matrices. This one allows to fix the singular values. This is only a heuristic without any guarantee, but this still usable and the best we got. We can either give a CV for the SV, or a TMA (results may be quite imprecise with this last method).

In terms of prediction, there is no denying that the SV are really useful. The relative success of the TMA is still not well understood. There are still open questions:

In which cases the TMA is a good predictor? What makes it good?
In which cases it is not? What makes it poor?

The objective is to take from it the good parts and to incorporate them in a more elegant solution. The current problem is that there are many other measures that are varying at the same time. Let's take a break from this, review properties of matrices used in the literature and come back to this with fresher ideas. Moreover, studying how to integrate extreme values in measures could lead to interesting insight.