Let's improve upon Siegel method with an aggregation method that set the correlation between the rows and the columns. The rationale for having the same standard deviation for the rows and the columns is that it is the same for the uniform case.
generate_matrix_aggregation <- function(R, C, rdistr, rdistc, rho) {
if (length(R) == 1)
R <- rdistr(R)
if (length(C) == 1)
C <- rdistc(C)
Z <- sapply(1:length(R), function(j) return(C))
for (i in 1:nrow(Z)) Z[i, ] <- Z[i, ] * R
Zc <- sapply(1:length(R), function(j) return(rdistc(length(C))))
for (i in 1:nrow(Z)) Zc[i, ] <- Zc[i, ] * rdistr(length(R))
alpha <- sqrt(rho)/muR
beta <- sqrt(1 - alpha^2)
gamma <- mean(R) * mean(C) * (1 - alpha - beta)
return(alpha * Z + beta * Zc + gamma)
}
set.seed(0)
rdistr <- function(n) {
return(runif(n, 1, 10))
}
rdistc <- function(n) {
return(runif(n, 1, 10))
}
muR <- mean(rdistr(1000))
muC <- mean(rdistc(1000))
sigmaR <- sqrt(var(rdistr(1000)))
sigmaC <- sqrt(var(rdistc(1000)))
rho <- 0.9
Z <- generate_matrix_aggregation(100, 100, rdistr, rdistc, rho)
c(muR * muC, mean(Z))
## [1] 29.76 30.29
c(sqrt(sigmaC^2 * sigmaR^2 + sigmaC^2 * muR^2 + muC^2 * sigmaR^2), sqrt(var(as.vector(Z))),
sqrt(var(Z[1, ])), sqrt(var(Z[, 1])))
## [1] 21.74 21.37 19.72 19.89
summary(sapply(1:100, function(x) return(cor.test(Z[1, ], Z[x, ])$estimate)))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.2380 -0.0538 0.0037 0.0183 0.0809 1.0000
summary(sapply(1:100, function(x) return(cor.test(Z[, 1], Z[, x])$estimate)))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.2180 -0.0736 0.0083 0.0215 0.0802 1.0000
c(rho, cor.test(Z[1, ], Z[2, ])$estimate, cor.test(Z[, 1], Z[, 2])$estimate)
## cor cor
## 0.900000 -0.057971 0.006723
That does not work because the correlation between two columns or rows is harder to analyse than expected. It appears the formulas for the correlation between two sums are incorrect. This derivation would require more work.
The correlation is
alpha^2*r*r'*sigmaC /
sqrt((r^2*sigmaC^2+sigmaR^2*sigmaC^2+sigmaR^2*muC^2+sigmaC^2*muR^2) *
(r'^2*sigmaC^2+sigmaR^2*sigmaC^2+sigmaR^2*muC^2+sigmaC^2*muR^2))
where r and r' are random variable distributed according to the rdistr. The expected value for this quantity seems complicated to derive because the random variable is also present is the denominator.