The previous study (March 14, 2013) was a failure. Let's evaluate Siegel's method before considering a new simpler aggregation method.
The final matrix has several characteristics:
The default method (following methods are mere adaptations that uses gamma distribution) has a mean that is muR * muC and it standard deviation is sqrt(sigmaC2 * sigmaR2 + sigmaC2 * muR2 + muC2 * sigmaR2).
The expected standard deviation of each row is muC * sigmaR and the expected standard deviation of each column is the same as the standard deviation of all values.
Finally, the correlation between row is 0 and the correlation between column is muR2 * sigmaC2 / (sigmaC2 * sigmaR2 + sigmaC2 * muR2 + muC2 * sigmaR2)
generate_matrix_siegel <- function(R, C, rdistr, rdistc) {
if (length(R) == 1)
R <- rdistr(R)
if (length(C) == 1)
C <- rdistc(C)
Z <- sapply(1:length(C), function(j) return(R))
for (i in 1:nrow(Z)) Z[i, ] <- Z[i, ] * rdistc(length(C))
return(Z)
}
set.seed(0)
rdistr <- function(n) {
return(runif(n, 1, 50))
}
rdistc <- function(n) {
return(runif(n, 1, 1000))
}
muR <- mean(rdistr(1000))
muC <- mean(rdistc(1000))
sigmaR <- sqrt(var(rdistr(1000)))
sigmaC <- sqrt(var(rdistc(1000)))
Z <- generate_matrix_siegel(1000, 1000, rdistr, rdistc)
c(muR * muC, mean(Z))
## [1] 12511 12972
c(sqrt(sigmaC^2 * sigmaR^2 + sigmaC^2 * muR^2 + muC^2 * sigmaR^2), sqrt(var(as.vector(Z))))
## [1] 11223 11151
c(muC * sigmaR, sqrt(var(Z[1, ])))
## [1] 6967 9817
c(sqrt(sigmaC^2 * sigmaR^2 + sigmaC^2 * muR^2 + muC^2 * sigmaR^2), sqrt(var(Z[,
1])))
## [1] 11223 11350
c(0, cor.test(Z[1, ], Z[2, ])$estimate)
## cor
## 0.00000 -0.02325
c(muR^2 * sigmaC^2/(sigmaC^2 * sigmaR^2 + sigmaC^2 * muR^2 + muC^2 * sigmaR^2),
cor.test(Z[, 1], Z[, 2])$estimate)
## cor
## 0.4691 0.3704
Given this, changing the input distributions will change only four parameters: muR, muC, sigmaR and sigmaC. This is insufficient to control all 6 characteristics (additionally, rhoR will always be zero).
This motivates a new symmetric approach that control for at least the two correlations and the global mean and standard deviation.