We need to adjust the combination-based method to avoid negative costs when used with a large CV. Also, we will make sure that extreme values (0,0) and (1,1) are correctly handled by the noise-based and the combination-based method. We will finally see what can be done for the following cases: (0,.), (1,.), (.,0) and (.,1).
Let's generate the extreme values with the current method. We will artificially put one of the value in each generated vector to the minimum one (zero).
n <- 2
m <- 2
rdist <- rgamma_cost
mu <- 1
sigma <- 0.1
rhoR <- 0.3
rhoC <- 0.6
combine <- function(X, Y, rho, mean) {
X[1] <- min(X[1], 0)
Y[1] <- min(Y[1], 0)
sqrt(rho) * X + sqrt(1 - rho) * Y + (1 - sqrt(rho) - sqrt(1 - rho)) * mean
}
C <- rdist(n, mu, sigma)
Z <- sapply(1:m, function(x) return(combine(C, rdist(length(C), mu, sigma),
rhoC, mu)))
R <- rdist(m, mu, sqrt(1 - rhoC) * sigma)
for (i in 1:nrow(Z)) Z[i, ] <- combine(R, Z[i, ], rhoR, mu)
(Z - mu)/sqrt(1 - rhoR * rhoC) + mu
## [,1] [,2]
## [1,] -0.9049 -0.279
## [2,] -0.5288 1.020
Analytically, the minimum value is:
(sqrt(1 - rhoR * rhoC) - sqrt(1-rhoR) * (sqrt(rhoC)+sqrt(1-rhoC)) - sqrt(rhoR))
* mu / sqrt(1 - rhoR * rhoC)
(sqrt(1 - rhoR * rhoC) - sqrt(1 - rhoR) * (sqrt(rhoC) + sqrt(1 - rhoC)) - sqrt(rhoR)) *
mu/sqrt(1 - rhoR * rhoC)
## [1] -0.9049
Therefore, for some values of correlations, it is not possible to guarantee positive values. To remove them, we can insert an additive value in the result. However, the mean is increased and the CV is thus decreased.
Another approach is to change the parameters to fix only the CV of the resulting matrix.
large <- generate_matrix_corr_positive(1000, 1000, rgamma_cost, 1, 1, rhoR,
rhoC)
min(large)
## [1] 3.166e-05
mean(large)
## [1] 0.9977
sqrt(var(as.vector(large)))/mean(large)
## [1] 1.033
medium <- generate_matrix_corr_positive(100, 100, rgamma_cost, 1, 1, rhoR, rhoC)
plot.ecdf(medium)
The final method takes correlations, CV and mean as parameters. We must, however, update the heterogeneity analysis:
rhoR <- 0.2
rhoC <- 0.3
mu <- 1
CV <- 0.1
large <- generate_matrix_corr_positive(1000, 1000, rgamma_cost, mu, CV, rhoR,
rhoC)
c(mean_CV_row(large), CV * (sqrt(rhoC) + sqrt(1 - rhoC)) * (sqrt(rhoR) + sqrt(1 -
rhoR)) * sqrt(1 - rhoC)/(sqrt(rhoR) + sqrt(1 - rhoR) * (sqrt(rhoC) + sqrt(1 -
rhoC))))
## [1] 0.09191 0.09220
c(mean_CV_col(large), CV * (sqrt(rhoC) + sqrt(1 - rhoC)) * (sqrt(rhoR) + sqrt(1 -
rhoR)) * sqrt(1 - rhoR)/(sqrt(rhoR) + sqrt(1 - rhoR) * (sqrt(rhoC) + sqrt(1 -
rhoC))))
## [1] 0.09985 0.09857
c(CV_mean_row(large), CV * (sqrt(rhoC) + sqrt(1 - rhoC)) * (sqrt(rhoR) + sqrt(1 -
rhoR)) * sqrt(rhoC) * sqrt(1 - rhoR)/(sqrt(rhoR) + sqrt(1 - rhoR) * (sqrt(rhoC) +
sqrt(1 - rhoC))))
## [1] 0.05667 0.05399
c(CV_mean_col(large), CV * (sqrt(rhoC) + sqrt(1 - rhoC)) * (sqrt(rhoR) + sqrt(1 -
rhoR)) * sqrt(1 - rhoC) * sqrt(rhoR)/(sqrt(rhoR) + sqrt(1 - rhoR) * (sqrt(rhoC) +
sqrt(1 - rhoC))))
## [1] 0.04085 0.04123
This is a bit more complicated, but this is closer to what we want. Let's just check that the correlations are still OK.
medium <- generate_matrix_corr_positive(300, 300, rgamma_cost, 1, 0.1, rhoR,
rhoC)
mean_cor_row(medium)
## [1] 0.2012
mean_cor_col(medium)
## [1] 0.2983
Let's complete our two methods (generate_heterogeneous_matrix_noise_corr
and
generate_matrix_corr_positive
) for the extreme cases (0,0) and (1,1).
n <- 100
m <- 100
mu <- 1
CV <- 1
Vmax <- 1
rdist <- rgamma_cost
rhoR <- 0
rhoC <- 0
mat <- generate_matrix_corr_positive(n, m, rdist, mu, CV, rhoR, rhoC)
mean_cor_row(mat)
## [1] 0.01987
mean_cor_col(mat)
## [1] 0.02018
mat <- generate_heterogeneous_matrix_noise_corr(n, m, rdist, rhoR, rhoC, Vmax)
mean_cor_row(mat)
## [1] 0.01796
mean_cor_col(mat)
## [1] 0.01994
It seems to work fine. Now, for (1,1):
rhoR <- 1
rhoC <- 1
mat <- generate_matrix_corr_positive(n, m, rdist, mu, CV, rhoR, rhoC)
mean_cor_row(mat)
## [1] 1
mean_cor_col(mat)
## [1] 1
mat <- generate_heterogeneous_matrix_noise_corr(n, m, rdist, rhoR, rhoC, Vmax)
mean_cor_row(mat)
## [1] 1
mean_cor_col(mat)
## [1] 1
What remains are the following cases: (0,.), (1,.), (.,0) and (.,1). Let's start with (0,.):
rhoR <- 0
rhoC <- 0.5
mat <- generate_matrix_corr_positive(n, m, rdist, mu, CV, rhoR, rhoC)
mean_cor_row(mat)
## [1] 0.01861
mean_cor_col(mat)
## [1] 0.5516
mat <- generate_heterogeneous_matrix_noise_corr(n, m, rdist, rhoR, rhoC, Vmax)
mean_cor_row(mat)
## [1] 0.02197
mean_cor_col(mat)
## [1] 0.5117
It works without problem. For (.,0):
rhoR <- 0.5
rhoC <- 0
mat <- generate_matrix_corr_positive(n, m, rdist, mu, CV, rhoR, rhoC)
mean_cor_row(mat)
## [1] 0.5745
mean_cor_col(mat)
## [1] 0.02142
mat <- generate_heterogeneous_matrix_noise_corr(n, m, rdist, rhoR, rhoC, Vmax)
mean_cor_row(mat)
## [1] 0.5023
mean_cor_col(mat)
## [1] 0.01829
Like a charm. Now, what about (1,.) and more specifically (1,0). It is easy to have simple matrices with perfectly correlated rows and columns that have only a high correlation. However, it is difficult to see a solution for controling this last correaltion. We propose to fail gracefully in this case.
rhoR <- 1
rhoC <- 0
mat <- generate_matrix_corr_positive(n, m, rdist, mu, CV, rhoR, rhoC)
## Error: No method implemented for this case
mat <- generate_heterogeneous_matrix_noise_corr(n, m, rdist, rhoR, rhoC, Vmax)
## Error: No method implemented for this case
Both methods are now ready to be used in experiments and to be described in the research report.