In this document we will perform disease mapping and spatial regression using NIMBLE
(de Valpine et al. 2017) and CARBayes. We will fit the Leroux prior (Leroux, Lei, and Breslow 2000) in the famous Scottish lip cancer dataset and compare the performance of the different softwares.
This practical requires the following packages to be installed and attached: sf
, dplyr
, tidyverse
, nimble
, coda
, spdep
, patchwork
, GGally
, ggmcmc
, CARBayesdata
and CARBayes
.
install.packages(c("sf", "dplyr", "tidyverse", "nimble", "coda", "spdep", "patchwork",
"GGally", "ggmcmc", "CARBayesdata", "CARBayes"), dependencies = TRUE, repos = "http://cran.r-project.org")
library(sf) # Simple feature for R
library(dplyr) # A package for data manipulation
library(tidyverse) # A package for data manipulation and plotting
library(nimble) # A package for performing MCMC in R
library(coda) # A package for summarizing and plotting of MCMC output
# and diagnostic tests
library(spdep) # A package to calculate neighbors
library(patchwork) # A package to combine plots
library(GGally) # A package to do pairplots
library(ggmcmc) # MCMC diagnostics with ggplot
library(CARBayes) # A package for fitting CAR models
library(CARBayesdata) # A package with the Scottish lip cancer data
library(kableExtra) # packages for formatting tables in rmarkdown
CARBayes
package.data(lipdata)
data(lipdbf)
data(lipshp)
lipdbf$dbf <- lipdbf$dbf[ ,c(2,1)]
data.combined <- combine.data.shapefile(data=lipdata, shp=lipshp, dbf=lipdbf)
W <- poly2nb(data.combined)
W <- nb2mat(W, zero.policy = TRUE, style = "B")
The SIRs will be smoothed using the Poisson-logNormal model. The inference is done with NIMBLE
called through R. In particular, let each ward \(i\) be indexed by the integers \(1, 2,...,N\). The model is as follows: \[
\begin{eqnarray}
O_{i}|\lambda_{i} & \sim & \text{Poisson}(\lambda_{i}E_{i} ) \\
\log(\lambda_{i}) & = & \alpha + \phi_{i} \\
\phi & \sim & N\Big(0,\tau_{\phi}^{-1}\big((1-\rho){\bf I} + \rho{\bf Q}^{-}\big)\Big)
\end{eqnarray}
\]
where \(O_i\) is the observed number of scottish lip cancer cases, \(E_{i}\) the expected, \(\alpha\) is an intercept term denoting the average log SIR, \(\phi_{i}\) is a random intercept and \(\tau_{\theta}\) a precision (reciprocal of the variance) term that controls the magnitude of \(\theta_{i}\).
# Chain 1
t_0 <- Sys.time()
chain1 <- S.CARleroux(formula= observed ~ 1 + offset(log(expected)),
data=data.combined,
family="poisson",
W=W,
burnin=10000,
n.sample=50000,
thin=2,
prior.mean.beta=0, # mean of the normal prior of the intercept
prior.var.beta=1, # var of the normal prior of the intercept
prior.tau2 = c(1, 0.01), # the shape and scale of Inv-Gamma
verbose = FALSE)
t_1 <- Sys.time()
t_chain1CARBayes <- t_1 - t_0
# Chain 2
t_0 <- Sys.time()
chain2 <- S.CARleroux(formula= observed ~ 1 + offset(log(expected)),
data=data.combined,
family="poisson",
W=W,
burnin=10000,
n.sample=50000,
thin=2,
prior.mean.beta=0,
prior.var.beta=1,
prior.tau2 = c(1, 0.01),
verbose = FALSE)
t_1 <- Sys.time()
t_chain2CARBayes <- t_1 - t_0
t_CARBayes <- t_chain1CARBayes + t_chain2CARBayes
Q <- matrix(0, nrow = nrow(W), ncol = ncol(W))
diag(Q) <- apply(W, 1, sum)
Q <- Q - W
LerouxCode <- nimbleCode(
{
for (i in 1:N){
O[i] ~ dpois(mu[i])
log(mu[i]) <- log(E[i]) + alpha + theta.centred[i]
SIR[i] <- exp(alpha + theta.centred[i])
theta.centred[i] <- theta[i] - theta.mean
}
PrecMat[1:N, 1:N] <- tau.theta*(rho*Q[1:N, 1:N] + (1 - rho)*diag(N))
theta[1:N] ~ dmnorm(mean = mu_ler[1:N], prec = PrecMat[1:N,1:N])
theta.mean <- sum(theta[1:N])/N
# intercept
alpha ~ dnorm(0, tau = 1)
# precision
tau.theta ~ dgamma(1, 0.01)
sigma2.theta <- 1/tau.theta
# mixing
rho ~ dunif(0,1)
}
)
NIMBLE
N = nrow(data.combined)
ScotLipdata = list(
O = data.combined$observed
)
ScotLipConsts <-list(
N = N,
E = data.combined$expected,
Q = Q,
mu_ler=rep(0, N)
)
inits <- list(
list(alpha=0.01, tau.theta=10, theta = rep(0.01,times=N), rho = 0.5), # chain 1
list(alpha=0.5, tau.theta=1, theta = rep(-0.01,times=N), rho = 0.7) # chain 2
)
params <- c("sigma2.theta", "SIR", "alpha", "rho", "theta", "theta.centred")
Note that the parameters that are not set, will NOT be monitored!
ni <- 100000 # nb iterations
nt <- 5 # thinning interval
nb <- 50000 # nb iterations as burn-in
nc <- 2 # nb chains
nimbleMCMC()
.t_0 <- Sys.time()
Lerouxsamples <- nimbleMCMC(code = LerouxCode,
data = ScotLipdata,
constants = ScotLipConsts,
inits = inits,
monitors = params,
niter = ni,
nburnin = nb,
thin = nt,
nchains = nc,
setSeed = 9,
progressBar = TRUE,
samplesAsCodaMCMC = TRUE,
summary = TRUE
)
t_1 <- Sys.time()
t_smallsetting <- t_1 - t_0
traceplot(chain1$samples$beta, las = 1, main = "intercept", col = "#ff9999")
traceplot(chain2$samples$beta, add = TRUE, col = "#33cccc")
grid()
traceplot(chain1$samples$tau2, las = 1, main = "variance", col = "#ff9999")
traceplot(chain2$samples$tau2, add = TRUE, col = "#33cccc")
grid()
traceplot(chain1$samples$rho, las = 1, main = "mixing", col = "#ff9999")
traceplot(chain2$samples$rho, add = TRUE, col = "#33cccc")
grid()
NIMBLE
.ggLerouxNIMBLE <- ggs(Lerouxsamples$samples)
ggLerouxNIMBLE %>% filter(Parameter == "alpha") %>%
ggs_traceplot + theme_bw() + theme(text = element_text(size = 12))
ggLerouxNIMBLE %>% filter(Parameter == "sigma2.theta") %>%
ggs_traceplot + theme_bw() + theme(text = element_text(size = 12))
ggLerouxNIMBLE %>% filter(Parameter == "rho") %>%
ggs_traceplot + theme_bw() + theme(text = element_text(size = 12))
From the first look, CARBayes needs less time (14 sec) and samples to converge compared to the NIMBLE specification shown in this tutorial (1.8 minutes). We should consider the current differences in the samplers, for instance CARBayes uses block sampling for the random effects, whereas in our NIMBLE specification we do not have block sampling.
data.frame(
rbind(
c("", "CARBayes", ""),
c("Median", "LL", "UL"),
round(
rbind(
round(chain1$summary.results[,1:3], digits= 2)
), digits = 2
),
c("", "NIMBLE", ""),
round(
Lerouxsamples$summary$all.chains["alpha", c("Median", "95%CI_low", "95%CI_upp")],
digits = 2),
round(
Lerouxsamples$summary$all.chains["sigma2.theta", c("Median", "95%CI_low", "95%CI_upp")],
digits = 2),
round(
Lerouxsamples$summary$all.chains["rho", c("Median", "95%CI_low", "95%CI_upp")],
digits = 2)
)
) -> tab
tab <- cbind(c("", "", "intercept", "variance", "mixing", "", "intercept", "variance", "mixing"),
tab)
colnames(tab) <- NULL
row.names(tab) <- NULL
knitr::kable(tab, caption = "Median and 95%CrI for the intercept and hyperparameters of the models.") %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "center")
CARBayes | |||
Median | LL | UL | |
intercept | 0.09 | -0.02 | 0.19 |
variance | 0.67 | 0.39 | 1.16 |
mixing | 0.89 | 0.62 | 0.99 |
NIMBLE | |||
intercept | 0.09 | -0.01 | 0.19 |
variance | 0.66 | 0.39 | 1.17 |
mixing | 0.92 | 0.66 | 0.99 |
par(mfrow = c(1,2), mar = c(2, 2.5, 3.5, 0))
densplot(chain1$samples$beta, las = 1, main = "Chain 1",
xlim = c(-0.5,0.5), col = "#ff9999")
densplot(chain2$samples$beta, col = "#33cccc",
las = 1, main = "Chain 2", xlim = c(-0.5,0.5))
title("alpha", outer = TRUE, line = -.9)
densplot(chain1$samples$tau2, las = 1, main = "Chain 1",
col = "#ff9999")
densplot(chain2$samples$tau2, col = "#33cccc", las = 1, main = "Chain 2")
title("variance", outer = TRUE, line = -.9)
densplot(chain1$samples$rho, las = 1, main = "Chain 1",
col = "#ff9999")
densplot(chain2$samples$rho, col = "#33cccc", las = 1, main = "Chain 2")
title("mixing", outer = TRUE, line = -.9)
ggLerouxNIMBLE %>% filter(Parameter == "alpha") %>%
ggs_density() + theme_bw() + theme(text = element_text(size = 12)) +
xlim(c(-0.5,0.5))
ggLerouxNIMBLE %>% filter(Parameter == "sigma2.theta") %>%
ggs_density() + theme_bw() + theme(text = element_text(size = 12))
ggLerouxNIMBLE %>% filter(Parameter == "rho") %>%
ggs_density() + theme_bw() + theme(text = element_text(size = 12))
dat.points <- data.frame(CARBayes = apply(chain1$samples$phi, 2, median),
NIMBLE =
Lerouxsamples$summary$all.chains[paste0("theta.centred[",
1:nrow(data.combined),"]"), "Median"])
ggplot() + geom_point(data = dat.points, aes(x = CARBayes, y = NIMBLE)) +
theme_bw() + ylim(c(-1.5, 1.5)) + xlim(c(-1.5, 1.5)) +
ggtitle("Medians of the latent fields") + geom_abline(a = 0, b = 1, col = "red")
2. Standard deviations
dat.box <- data.frame(sd = c(Lerouxsamples$summary$all.chains[paste0("theta.centred[",
1:nrow(data.combined), "]"), "St.Dev."],
apply(chain1$samples$phi, 2, sd)),
model = c(rep("NIMBLE", times = nrow(data.combined)),
rep("CARBayes", times = nrow(data.combined))))
ggplot() + geom_boxplot(data = dat.box, aes(x = model, y = sd)) + theme_bw() +
ggtitle("SD of the latent fields")
mat <- matrix(NA, nrow = 20000, ncol = nrow(data.combined))
for(i in 1:nrow(data.combined)){
mat[,i] <- exp(chain1$samples$beta + chain1$samples$phi[,i])
}
cor(Lerouxsamples$summary$all.chains[paste0("SIR[",
1:nrow(data.combined), "]"), "Median"],
apply(mat, 2, median))
## [1] 0.9997017
dat.box <- data.frame(sd = c(Lerouxsamples$summary$all.chains[paste0("SIR[",
1:nrow(data.combined), "]"), "St.Dev."],
apply(mat, 2, sd)),
model = c(rep("NIMBLE", times = nrow(data.combined)),
rep("CARBayes", times = nrow(data.combined))))
ggplot() + geom_boxplot(data = dat.box, aes(x = model, y = sd)) + theme_bw() +
ggtitle("SD of the SMRs")
The results are very similar (almost identical) with respect to the SMRs.
This tutorial is a first attempt to fit the Leroux model in NIMBLE
. We compared the results with the CARBayes
package which is the standard when it comes to fitting the Leroux model. The results are almost identical, nevertheless nimble needs slightly more time to converge. Future work includes a block sampling implementation (as is currently in CARBayes
) in NIMBLE
.
de Valpine, P., D. Turek, C. J. Paciorek, C. Anderson-Bergman, D. Temple Lang, and R. Bodik. 2017. “Programming with Models: Writing Statistical Algorithms for General Model Structures with NIMBLE.” Journal of Computational and Graphical Statistics 26: 403–17. https://doi.org/10.1080/10618600.2016.1172487.
Leroux, Brian G, Xingye Lei, and Norman Breslow. 2000. “Estimation of Disease Rates in Small Areas: A New Mixed Model for Spatial Dependence.” In Statistical Models in Epidemiology, the Environment, and Clinical Trials, 179–91. Springer.