Let’s develop a simple functions that predicts a coarse convergence given a set of measures applied to a simple MCMC. This will then serve as a basis for a traditional convergence measures for the coda package.

We start with a simple matrix.

n <- 30
m <- 10
M <- 10
col <- sample(1:M, n, replace = TRUE)
row <- sample(1:M, m, replace = TRUE)
mat <- matrix(col) %*% t(matrix(row))

Let’s generate the chain.

modif_mat <- function(mat) {
  rows <- sample(nrow(mat), 2)
  cols <- sample(ncol(mat), 2)
  submat <- mat[rows,cols]
  shift_min <- -min(submat[1,1], submat[2,2])
  shift_max <- min(submat[1,2], submat[2,1])
  shift <- sample(shift_min:shift_max, size = 1)
  submat[1,1] <- submat[1,1] + shift
  submat[2,2] <- submat[2,2] + shift
  submat[1,2] <- submat[1,2] - shift
  submat[2,1] <- submat[2,1] - shift
  mat[rows,cols] <- submat
  mat
}
MCMC <- 3e4
# Initialize the list of results to avoid costly concatenations
mats <- c(list(mat), replicate(MCMC, list()))
for (iterations in 1:MCMC)
  mats[[iterations + 1]] <- modif_mat(mats[[iterations]])

Let’s apply all measures on each step of the chain.

measures <- list(sd_meas, mean_CV_row, mean_CV_col,
                 mean_cor_row, mean_cor_col, chi2_meas)
values <- map(measures, ~ map_dbl(mats, .))
values %>%
  as.data.frame(col.names = 1:6) %>%
  cbind(x = 0:MCMC) %>%
  gather(measure, value, -x) %>%
  ggplot(aes(x = x, y = value)) +
  facet_wrap(~ measure, scales = "free") +
  geom_point()

OK, it seems clear that the convergence occurs before all iterations. Let’s design a simple approach to detect a correct number of iterations to work with: how much of the values of the first half must be in the range of the values of the second half?

test_convergence <- function(values) {
  n <- length(values)
  first_half <- head(values, n / 2)
  second_half <- tail(values, n / 2)
  sum(first_half >= min(second_half) & first_half <= max(second_half)) / (n / 2)
}
test_conv <- map(values, function(measures) {
  map_dbl(1:MCMC, ~ test_convergence(measures[1:.]))
})
test_conv %>%
  as.data.frame(col.names = 1:6) %>%
  cbind(x = 1:MCMC) %>%
  gather(measure, conv, -x) %>%
  ggplot(aes(x = x, y = conv)) +
  facet_wrap(~ measure) +
  geom_point()

Let’s consider the last time the convergence test returns a value below 10% (the convergence does not occur before) and the first time the test returns a value above 90% (convergence has occured).

map(test_conv, ~ c(max(which(. <= 0.1)), min(which(. >= 0.9))))
## [[1]]
## [1]  2687 20850
## 
## [[2]]
## [1] 1143 9820
## 
## [[3]]
## [1]  2597 20820
## 
## [[4]]
## [1]  1061 13000
## 
## [[5]]
## [1]   960 15980
## 
## [[6]]
## [1]  1133 18211

It seems convergence starts around 1000 iterations and is completly reached around 10000 iterations.

Let’s confirm this result with Geweke’s method. We take twice as much measures as the maximum number of iterations for the convergence.

library(coda)
par(mfrow = c(2, 3))
null <- map(values, ~ geweke.plot(mcmc(.[1:20000]), nbins = 41,
                                  auto.layout = FALSE))

The convergence occurs indeed around 1000/2000 iterations.

We have thus a general method to provide a rough estimate of the convergence backed up with a more specific method from Geweke.

## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
## 
## locale:
##  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
##  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
##  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] coda_0.19-1   stringr_1.0.0 tidyr_0.2.0   ggplot2_2.2.1 dplyr_0.4.3  
## [6] purrr_0.2.0  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.10     knitr_1.15.1     magrittr_1.5     munsell_0.4     
##  [5] lattice_0.20-35  colorspace_1.2-2 R6_2.1.0         rlang_0.1       
##  [9] plyr_1.8.1       tools_3.4.0      parallel_3.4.0   grid_3.4.0      
## [13] gtable_0.1.2     pacman_0.4.6     DBI_0.3.1        htmltools_0.3.6 
## [17] yaml_2.1.13      lazyeval_0.2.0   rprojroot_1.2    digest_0.6.3    
## [21] assertthat_0.1   tibble_1.3.1     evaluate_0.10    rmarkdown_1.5   
## [25] stringi_0.5-5    compiler_3.4.0   scales_0.4.1     backports_1.0.5