Fixing noise-based method (July 9 and July 10, 2015)

In the previous report, we found an anomaly in the behavior of the noise-based method. Let's see whether this has an important impact and whether it can be fixed.

Anomaly reproduction

n <- 100
m <- 30
CV <- 0.1
distance_noise_corr <- NULL
for (i in 1:100) {
    corr <- runif(1, 0, 1)
    Z <- generate_heterogeneous_matrix_noise_corr(n, m, rgamma_cost, corr, 0, 
        CV)
    distance_noise_corr <- rbind(distance_noise_corr, data.frame(corr = corr, 
        dis = distance_uniform_identical(Z)))
}

library(ggplot2)
p <- ggplot(data = distance_noise_corr, aes(x = corr, y = dis))
p <- p + geom_point() + geom_smooth()
p

## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

plot of chunk unnamed-chunk-1

Noise-based property

Let's remind the input variables of the original and modified methods and the properties for both of them in terms on heterogeneity and correlation.

The original method use Vtask, Vmach and Vnoise. Its properties are:

Vmutask = Vtask
Vmumach = Vmach
muVtask = sqrt(Vtask^2*Vnoise^2+Vtask^2+Vnoise^2)
muVmach = sqrt(Vmach^2*Vnoise^2+Vmach^2+Vnoise^2)
rhotask = 1 / (Vnoise^2 + Vnoise^2 / Vmach^2 + 1)
rhomach = 1 / (Vnoise^2 + Vnoise^2 / Vtask^2 + 1)

The modified method input variables are Vmax, rhotask and rhomach. It transforms them as follows (we can then use the original method):

k <- Vmax / sqrt(Vmax^2 + 1)
Vnoise <- min(Vmax, sqrt(1 / max(rhoR, rhoC) - 1) * k)
Vtask <- Vnoise / sqrt(1 / rhoC - Vnoise^2 - 1)
Vmach <- Vnoise / sqrt(1 / rhoR - Vnoise^2 - 1)

In the previous settings, rhoC=0 and we are interested by comparing the properties of this method when rhoR is low and when it is high. In the former case (rhoR < 1 / (Vmax^2+2)), we have:

Vmutask = 0
Vmumach = Vmax*sqrt(rhoR) / sqrt(1-Vmax^2*rhoR-rhoR)

In the later case:

Vmutask = 0
Vmumach = Vmax

Here, we have it. In the former case, the machine heterogeneity is varying while it is not in the later. Let's plot this:

rhoR <- seq(0, 1, 0.01)
Vmax <- 0.1
plot(rhoR, pmin(Vmax * sqrt(rhoR)/sqrt(1 - Vmax^2 * rhoR - rhoR), Vmax), type = "l")

## Warning: production de NaN

plot of chunk unnamed-chunk-2

Effect of the correlation on the heterogeneity

It is interesting because when the task correlation is low, the heuristic performance is stable and the heterogeneity is increasing. We already know that the heterogeneity has an impact on the heuristic performance. It would be ideal to fix the heterogeneity when studying the effect of the correlation. It could even be possible that the previous study showing the effect of heterogeneity was actually only showing the indirect effect of the correlation (it would have been necessary to fix the heterogeneity).

Let's measure the two types of heterogeneity with different correlation settings.

Vmax <- 1
heterogeneity <- NULL
for (rhoR in seq(0, 1 - 0.01, 0.01)) for (rhoC in seq(0, 1 - 0.01, 0.01)) {
    Vnoise <- min(Vmax, sqrt(1/max(rhoR, rhoC) - 1)/sqrt(1/Vmax^2 + 1))
    Vtask <- 1/sqrt((1/rhoC - 1)/Vnoise^2 - 1)
    Vmach <- 1/sqrt((1/rhoR - 1)/Vnoise^2 - 1)
    Vmutask = 1/sqrt((1/rhoC - 1)/Vnoise^2 - 1)
    Vmumach = 1/sqrt((1/rhoR - 1)/Vnoise^2 - 1)
    muVtask = sqrt(Vtask^2 * Vnoise^2 + Vtask^2 + Vnoise^2)
    muVmach = sqrt(Vmach^2 * Vnoise^2 + Vmach^2 + Vnoise^2)
    heterogeneity <- rbind(heterogeneity, data.frame(rhoR = rhoR, rhoC = rhoC, 
        hetero = Vmutask, method = "CV_mean_row"), data.frame(rhoR = rhoR, rhoC = rhoC, 
        hetero = Vmumach, method = "CV_mean_col"), data.frame(rhoR = rhoR, rhoC = rhoC, 
        hetero = muVmach, method = "mean_CV_row"), data.frame(rhoR = rhoR, rhoC = rhoC, 
        hetero = muVtask, method = "mean_CV_col"))
}

Let's plot the heterogeneities:

p <- ggplot(data = heterogeneity, aes(x = rhoR, y = rhoC, fill = hetero, z = hetero))
p <- p + geom_tile()
p <- p + stat_contour()
p <- p + facet_wrap(~method)
p

plot of chunk unnamed-chunk-3

We see that the heterogeneity is stable for low task correlation and large task correlation.

Preserving the heterogeneity

Let's see if it is possible to fix the heterogeneity to some level. It seems difficult because there are three paramaters (Vtask, Vmach and Vnoise) and there are two input correlations. We cannot fix both Vtask and Vmach.

Another idea would be to avoid a min or a max when fixing the three parameters. This would prevent any inflexion point. Let's fix Vnoise to 1:

for (rhoR in c(0.01, 0.99)) for (rhoC in c(0.01, 0.99)) {
    Vtask <- 0.1/sqrt(1/rhoC - 1.01)
    Vmach <- 0.1/sqrt(1/rhoR - 1.01)
    print(c(Vtask, Vmach))
}

## [1] 0.01005 0.01005
## [1] 9.94987 0.01005
## [1] 0.01005 9.94987
## [1] 9.95 9.95

This may be actually worse because having an inflexion point may lead to some areas where the heterogeneity is constant (hence, the effect we see could really be associated to the correlation) and the maximum coefficient of variation must really be limited.

Analysing the heterogeneity

From the previous plots, we can conclude that there is three areas (depending on whether Vnoise is determined by Vmax, rhoR or rhoC). In the first area (Vmax related) with low rhoR and rhoC, one kind of heterogeneity varies a lot while the other does not. In the top-left area (rhoC related), the task heterogeneity remains constant and large while the machine heterogeneity varies. It is the opposite for the last area.

Conclusion

The heterogeneity properties must be assessed obviously. With this method, no solution was found to improve it. The plots must be included in the report and a similar analysis must be done for the combination-based method. It may or may not play a role in the final discussion.