In this document we will perform disease mapping and spatial regression using NIMBLE (de Valpine et al. 2017). We will fit a series of disease mapping models with unstructured and structured random effects (ICAR, BYM and BYM2). The data can be downloaded by geodacenter and includes lip cancer cases (CANCER) per ward in Scotland during 1975-1980 (Clayton and Kaldor 1987). The dataset also includes the district number (DISTRICT), name (NAME), code (CODE), the population per ward (POP) and the expected number of cases (CEXP) (Lawson 1999).

Install and load packages

This practical requires the following packages to be installed and attached: sf, dplyr, tidyverse, nimble, coda, spdep, patchwork, GGally, ggmcmc and INLA.

  • To install the entire suite of packages, we can use:
install.packages(c("sf", "dplyr", "tidyverse", "nimble", "coda", "spdep", "patchwork", 
                   "GGally", "ggmcmc"), dependencies = TRUE, repos = "http://cran.r-project.org")
  • For INLA, you can use
install.packages("INLA",repos=c(getOption("repos"),
                                INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)
  • Then, load the needed packages:
library(sf)           # Simple feature for R
library(dplyr)        # A package for data manipulation
library(tidyverse)    # A package for data manipulation and plotting


library(nimble)       # A package for performing MCMC in R

library(coda)         # A package for summarizing and plotting of MCMC output 
                      # and diagnostic tests
library(spdep)        # A package to calculate neighbors

library(patchwork)    # A package to combine plots
library(GGally)       # A package to do pairplots
library(ggmcmc)       # MCMC diagnostics with ggplot

library(INLA)         # A package for quick Bayesian inference (here used to calculate
                      # the scale for BYM2)

library(kableExtra)   # packages for formatting tables in rmarkdown

Import and explore the Scottish lip cancer data

  • Import the health data
ScotLip <- read_sf("ScotLip.shp")
ScotLip
## Simple feature collection with 56 features and 7 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 95631 ymin: 530297 xmax: 454570 ymax: 1203008
## CRS:            NA
## # A tibble: 56 x 8
##    CODENO DISTRICT NAME   CODE  CANCER    POP  CEXP                     geometry
##     <int>    <int> <chr>  <chr>  <int>  <int> <dbl>                    <POLYGON>
##  1   6126        1 Skye-~ w6126      9  28324  1.38 ((214091.9 841215.2, 218829~
##  2   6016        2 Banff~ w6016     39 231337  8.66 ((383866 865862, 398721 867~
##  3   6121        3 Caith~ w6121     11  83190  3.04 ((311487 968650, 320989 968~
##  4   5601        4 Berwi~ w5601      9  51710  2.53 ((377180 672603, 386871.7 6~
##  5   6125        5 Ross-~ w6125     15 129271  4.26 ((278680.1 882371.8, 294960~
##  6   6554        6 Okney  w6554      8  53199  2.4  ((328040 1011192, 321498 10~
##  7   6019        7 Moray  w6019     26 245513  8.11 ((348968 868724, 352324 867~
##  8   6655        8 Shetl~ w6655      7  62603  2.3  ((450593 1203008, 454570 12~
##  9   6123        9 Locha~ w6123      6  59183  1.98 ((192646 807178, 216426 805~
## 10   6017       10 Gordon w6017     20 165554  6.63 ((397156 840640, 399497 838~
## # ... with 46 more rows
  • Calculate the SIRs (standarized incidence ratios) and add them to the data frame ScotLip
ScotLip$crudeSIR <- ScotLip$CANCER/ScotLip$CEXP
  • Some plots for the crude SIRs
ggplot() + geom_boxplot(data = ScotLip, aes(y = crudeSIR)) + theme_bw() + 
           theme(text = element_text(size = 12))-> p1

ggplot() + geom_sf(data = ScotLip, aes(fill = crudeSIR), col = "NA") + 
           scale_fill_viridis_c() + theme_bw() + 
           theme(axis.text = element_blank(), text = element_text(size = 12)) -> p2
(p1|p2)

Obtaining the posterior relative risks (SIRs)

Unstructured random effect

The SIRs will be smoothed using the Poisson-logNormal model. The inference is done with NIMBLE called through R. In particular, let each ward \(i\) be indexed by the integers \(1, 2,...,N\). The model is as follows: \[ \begin{eqnarray} O_{i}|\lambda_{i} & \sim & \text{Poisson}(\lambda_{i}E_{i} ) \\ \log(\lambda_{i}) & = & \alpha + \theta_{i} \\ \theta_{i} & \sim & N(0,1/\tau_{\theta}) \end{eqnarray} \]

where \(O_i\) is the observed number of scottish lip cancer cases, \(E_{i}\) the expected, \(\alpha\) is an intercept term denoting the average log SIR, \(\theta_{i}\) is a random intercept and \(\tau_{\theta}\) a precision (reciprocal of the variance) term that controls the magnitude of \(\theta_{i}\).

Code

  • We will first write the model in nimble:
UnstrCode <- nimbleCode(
  {
    for (i in 1:N){
      
      O[i] ~ dpois(mu[i])                                # Poisson for observed counts 
      log(mu[i]) <- log(E[i]) + alpha + theta[i] 
      
      theta[i] ~ dnorm(0, tau = tau.theta)               # area-specific unstr random effects
      SIR[i] <- exp(alpha + theta[i])                    # area-specific SIR
      resSIR[i] <- exp(theta[i])                         # area-specific residual SIR
      e[i] <- (O[i]-mu[i])/sqrt(mu[i])                   # residuals      
    } 
    
    # Priors:
    alpha ~ dflat()                                      # Unif(-inf, +inf)
    overallSIR <- exp(alpha)                             # overall SIR across study region
    
    tau.theta ~ dgamma(1, 0.01)                          # prior for the precision
    sigma2.theta <- 1/tau.theta                          # variance of theta
  }
)
  • Create data object as required for NIMBLE. Here we define the data and its constants. In our case the data is the observed number of lip cancer cases, whereas the constants the total number of wards and the expected number of cases per ward:
# Obtain the number of wards
N <- dim(ScotLip)[1] 

# Format the data for NIMBLE in a list
ScotLipdata = list(
                 O = ScotLip$CANCER         # observed nb of cases
)

ScotLipConsts <-list(
                 N = N,                     # nb of wards   
                 E = ScotLip$CEXP           # expected number of cases
                   
)
  • What are the parameters to be initialised? Create a list with two elements (each a list) with different initial values for the parameters:
# Initialise the unknown parameters, 2 chains
inits <- list(
  list(alpha=0.01, tau.theta=10, theta = rep(0.01,times=N)),  # chain 1
  list(alpha=0.5, tau.theta=1, theta = rep(-0.01,times=N)))   # chain 2
  • Set the parameters that will be monitored:
# Monitored parameters
params <- c("sigma2.theta", "overallSIR", "resSIR", "SIR", "e", "mu", "alpha", "theta")

Note that the parameters that are not set, will NOT be monitored!

  • Specify the MCMC setting:
# MCMC setting
ni <- 50000  # nb iterations 
nt <- 10     # thinning interval
nb <- 10000   # nb iterations as burn-in 
nc <- 2      # nb chains

The burn-in should be long enough to discard the initial part of the Markov chains that have not yet converged to the stationary distribution.

  • Run the MCMC simulations calling Nimble from R using the function nimbleMCMC()
UnstrCodesamples <- nimbleMCMC(code = UnstrCode,
                      data = ScotLipdata,
                      constants = ScotLipConsts, 
                      inits = inits,
                      monitors = params,
                      niter = ni,
                      nburnin = nb,
                      thin = nt, 
                      nchains = nc, 
                      setSeed = 9, 
                      progressBar = FALSE,      # if true then you can monitor the progress
                      samplesAsCodaMCMC = TRUE, 
                      summary = TRUE, 
                      WAIC = TRUE
                      )

Check convergence

Note that specifying samplesAsCodaMCMC = TRUE, we can use the functionality of coda package to run diagnostics.

  • The Gelman-Rubin diagnostic (Rhat) The Gelman-Rubin diagnostic evaluates MCMC convergence by analyzing the difference between multiple Markov chains (Gelman, Rubin, and others 1992). The convergence is assessed by comparing the estimated between-chains and within-chain variances for each model parameter. Large differences between these variances indicate nonconvergence. Indeed, from the summary statistics, we saw displayed the potential scale reduction factor (psrf) or Rhat.
    When the scale reduction factor is high (perhaps greater than 1.1), then we should run our chains out longer to improve convergence to the stationary distribution.

We will use the function gelman.diag() from the package coda to calculate the Rhat and get an idea of which random variables have already converged:

GR.diag <- gelman.diag(UnstrCodesamples$samples, multivariate = FALSE)
all(GR.diag$psrf[,"Point est."] < 1.1) 
## [1] TRUE
which(GR.diag$psrf[,"Point est."] > 1.1) 
## named integer(0)

There is some evidence that most random variables of interest have converged. Lets assess it more. We can do traceplots and autocorrelation plots using the coda package:

  • Check the overall SIR:
unstr_ggmcmc <- ggs(UnstrCodesamples$samples) 

unstr_ggmcmc %>% filter(Parameter == "overallSIR") %>% 
  ggs_traceplot + theme_bw() + theme(text = element_text(size = 12)) -> p1

unstr_ggmcmc %>% filter(Parameter == "overallSIR") %>% 
  ggs_autocorrelation + theme_bw() + theme(text = element_text(size = 12)) -> p2

p1/p2

  • Check the \(1/\tau_{\theta}\):
unstr_ggmcmc %>% filter(Parameter == "sigma2.theta") %>% 
  ggs_traceplot + theme_bw() + theme(text = element_text(size = 12)) -> p1

unstr_ggmcmc %>% filter(Parameter == "sigma2.theta") %>% 
  ggs_autocorrelation + theme_bw() + theme(text = element_text(size = 12)) -> p2

p1/p2

  • Check the SIR of a random area:
set.seed(9)
ra <- sample(1:N, size = 1)

unstr_ggmcmc %>% filter(Parameter == paste0("SIR[", ra, "]")) %>% 
  ggs_traceplot + theme_bw() + theme(text = element_text(size = 12)) -> p1

unstr_ggmcmc %>% filter(Parameter == paste0("SIR[", ra, "]")) %>% 
  ggs_autocorrelation + theme_bw() + theme(text = element_text(size = 12)) -> p2

p1/p2

Extract results

Once we are more or less sure that the chain has reached convergence we can extract the results

  • Summarize posteriors from UnstrCodesamples:
head(UnstrCodesamples$summary$chain1, digits = 3)
head(UnstrCodesamples$summary$chain2, digits = 3)
head(UnstrCodesamples$summary$all.chains, digits = 3)
# also
UnstrCodesamples$summary$chain2[c(1, 2, 3, 7),]
# or
UnstrCodesamples$summary$chain2["sigma2.theta",]
  • We can obtain the posterior distribution of the random variable of interest, say sigma2.theta:
unstr_ggmcmc %>% filter(Parameter == "sigma2.theta") %>% 
                 ggs_histogram() + theme_bw() + theme(text = element_text(size = 12)) -> p1

unstr_ggmcmc %>% filter(Parameter == "sigma2.theta") %>% 
                 ggs_density() + theme_bw() + theme(text = element_text(size = 12)) -> p2

p1/p2

  • To map the smoothed SIRs in R we extract the posterior mean of the SIRs and add it on the shapefile.
ScotLip$unstr_SIR <- UnstrCodesamples$summary$all.chains[paste0("SIR[", 1:N, "]"), "Median"]
  • Using ggplot2 to produce a map of the smoothed SIRs
ggplot() + geom_sf(data = ScotLip, aes(fill = unstr_SIR), col = NA) + theme_bw() + 
                   scale_fill_viridis_c(limits = c(0,5)) + 
                   theme(axis.text = element_blank(), 
                         text = element_text(size = 12))

  • We can also do a boxplot to assess the level of global smoothing
dat.box = data.frame(SMR = c(ScotLip$crudeSIR, ScotLip$unstr_SIR), 
                     type = c(rep("Crude", times = N), rep("SIR_unstr", times = N)))
ggplot() + geom_boxplot(data = dat.box, aes(x = type, y = SMR, fill = type), alpha = .4) + 
           theme_bw() + theme(text = element_text(size = 14))

* We can also get the posterior probability that the spatial SIR per area is larger than 1:

postProb <- sapply(paste0("SIR[", 1:N, "]"), 
                   function(X) mean(UnstrCodesamples$samples$chain1[,X]>1))
ScotLip$unstr_postProb <- postProb

ggplot() + geom_sf(data = ScotLip, aes(fill = postProb), col = NA) + theme_bw() + 
                   scale_fill_viridis_c(limits = c(0,1)) + 
                   theme(axis.text = element_blank(), 
                         text = element_text(size = 12)) -> p1

ggplot() + geom_boxplot(data = ScotLip, aes(y = postProb)) + 
                        scale_fill_viridis_d(alpha = .5) + theme_bw() + xlim(c(-1,1)) + 
                        theme(axis.text.x = element_blank(), 
                              text = element_text(size = 12)) -> p2

(p1|p2) + plot_annotation(title = 'Posterior probability that Pr(SIR>1)')

The ICAR model

Model specification

We will now fit an ICAR model (Besag, York, and Mollié 1991). Note that an ICAR prior models strong spatial dependence. The model is as follows: \[ \begin{eqnarray} O_{i}|\lambda_{i} & \sim & \text{Poisson}(\lambda_{i}E_{i} ) \\ \log(\lambda_{i}) & = & \alpha + \phi_{i} \\ \phi_{i} & \sim & \text{ICAR}({\bf W}, 1/\tau_{\phi}) \end{eqnarray} \]

where in this case the random effects are denoted with \(\phi\) and model strong spatial dependence. The elements of the adjacency matrix \({\bf W}\) are defined as: \[\begin{equation} w_{ij} = \begin{cases} 1 & \text{if } j \in \partial_i \\ 0 & \text{otherwise} \end{cases} \end{equation}\]

where \(\partial_i\) is the set of neighbors of \(j\).

  • Create the list of neighbors based on the shapefile
Wards_nb <- poly2nb(pl = ScotLip)
nbWB_A <- nb2WB(nb = Wards_nb)

Code

  • The priors of the model are kept the same. This is not a good practice and in general priors should be motivated by the research question in hand. Nevertheless, the purpose of this tutorial is show how these models are coded in NIMBLE and not to answer any research question specific for the scottish lip cancer data.
ICARCode <- nimbleCode(
  {
    for (i in 1:N){
      
      O[i] ~ dpois(mu[i])                                # Poisson for observed counts 
      log(mu[i]) <- log(E[i]) + alpha + phi[i]
      
      SIR[i] <- exp(alpha + phi[i])                      # area-specific SIR
      resSIR[i] <- exp(phi[i])                           # area-specific residual SIR
      e[i] <- (O[i]-mu[i])/sqrt(mu[i])                   # residuals      
    } 
    
    # ICAR
    phi[1:N] ~ dcar_normal(adj[1:L], weights[1:L], num[1:N], tau.phi, zero_mean = 1) 
    # the zero_mean is to impose sum to zero constrains in case you have an intercept in the model
    
    # Priors:
    alpha ~ dflat()                                      # vague uniform prior
    overallSIR <- exp(alpha)                             # overall SIR across study region
    
    tau.phi ~ dgamma(1, 0.01)                            # precision of the ICAR component
    sigma2.phi <- 1/tau.phi                              # variance of the ICAR component
    
  }
  
)
  • Format the data as previously, now we need to specify the characteristics of the adjacency matrix in the constants. Note that L is the total number of neighbors, N as before the total number of wards, adj is an index showing the neighboring areas, num shows how many neighbors each area has, weights is a vector of 1s with length L. Notice the the zero_mean = 1 imposes sum to zero constraints. This is important in case you include an intercept in the model to bypass the identifiability issues. If you dont have an intercept in the model then you should remove the sum to zero constraints. In theory, given a flat prior on the intercept both approaches should be identical.
# Obtain the number of wards
N <- dim(ScotLip)[1] 

# Format the data for NIMBLE in a list
ScotLipdata = list(
                 O = ScotLip$CANCER                 # observed nb of cases
)        
        
ScotLipConsts <-list(        
                 N = N,                             # nb of wards   
                 E = ScotLip$CEXP,                  # expected number of cases
                 
                 # and the elements of the neighboring matrix:
                 L = length(nbWB_A$weights),        
                 adj = nbWB_A$adj,                             
                 num = nbWB_A$num,
                 weights = nbWB_A$weights
                   
)
  • Initial values (as before):
# Initialise the unknown parameters, 2 chains
inits <- list(
  list(alpha=0.01, 
       tau.phi=10, phi = rep(0.01,times=N)),  # chain 1
  list(alpha=0.5, 
       tau.phi=1, phi = rep(-0.01,times=N))   # chain 2
  )
  • Set the parameters that will be monitored:
# Monitored parameters
params <- c("sigma2.phi", "overallSIR", "resSIR", "SIR", "e", "mu", "alpha", "phi")
  • Specify the MCMC setting (identical as before):
# MCMC setting
ni <- 50000  # nb iterations 
nt <- 10     # thinning interval
nb <- 10000   # nb iterations as burn-in 
nc <- 2      # nb chains
  • and run the sampler
ICARCodesamples <- nimbleMCMC(code = ICARCode,
                               data = ScotLipdata,
                               constants = ScotLipConsts, 
                               inits = inits,
                               monitors = params,
                               niter = ni,
                               nburnin = nb,
                               thin = nt, 
                               nchains = nc, 
                               setSeed = 9, 
                               progressBar = FALSE,     
                               samplesAsCodaMCMC = TRUE, 
                               summary = TRUE, 
                               WAIC = TRUE
)

Check convergence

We can check convergence similar as before:

GR.diag <- gelman.diag(ICARCodesamples$samples, multivariate = FALSE)
all(GR.diag$psrf[,"Point est."] < 1.1) 
## [1] TRUE
which(GR.diag$psrf[,"Point est."] > 1.1) 
## named integer(0)

Note that you can get an overall plot for the G-R diagnostic using the ggmcmc package. Lets select specific variables because it will look messy, say overallSIR, sigma2.phi, alpha and resSIR[1:3]

ICAR_ggmcmc <- ggs(ICARCodesamples$samples) 

ICAR_ggmcmc %>% filter(Parameter == c("overallSIR", "sigma2.phi", 
                                      "alpha", paste0("resSIR[", 1:3, "]"))) %>% 
                ggs_Rhat() + xlab("R_hat") + theme_bw() + theme(text = element_text(size = 16))

  • Check the overall SIR:
ICAR_ggmcmc %>% filter(Parameter == "overallSIR") %>% 
  ggs_traceplot + theme_bw() + theme(text = element_text(size = 12)) -> p1

ICAR_ggmcmc %>% filter(Parameter == "overallSIR") %>% 
  ggs_autocorrelation + theme_bw() + theme(text = element_text(size = 12)) -> p2

p1/p2

Of course more checks are needed to make sure that the random variables of interest have converged, but the initial impression is positive.

For comparison purposes, I will attached the ICAR SIR on the initial shp

ScotLip$ICAR_SIR <- ICARCodesamples$summary$all.chains[paste0("SIR[", 1:N, "]"), "Median"]

The BYM model

Mdeol specification

The BYM model is probably the most popular model for disease mapping. It is basically a combination of the ICAR model and a model with an unstructured component, see (Besag, York, and Mollié 1991) for more details: \[ \begin{eqnarray} O_{i}|\lambda_{i} & \sim & \text{Poisson}(\lambda_{i}E_{i} ) \\ \log(\lambda_{i}) & = & \alpha + \theta_{i} + \phi_{i} \\ \theta_{i} & \sim & N(0,1/\tau_{\theta}) \\ \phi_{i} & \sim & \text{ICAR}({\bf W}, 1/\tau_{\phi}) \end{eqnarray} \]

Code

The code is as follows:

BYMCode <- nimbleCode(
  {
    for (i in 1:N){
      
      O[i] ~ dpois(mu[i])                                  # Poisson for observed counts 
      log(mu[i]) <- log(E[i]) + alpha + theta[i] + phi[i]
      
      theta[i] ~ dnorm(0, tau = tau.theta)                 # area-specific RE
      SIR[i] <- exp(alpha + theta[i] + phi[i])             # area-specific SIR
      resSIR[i] <- exp(theta[i] + phi[i])                  # area-specific residual SIR
      e[i] <- (O[i]-mu[i])/sqrt(mu[i])                     # residuals      
    } 
    
    # ICAR
    phi[1:N] ~ dcar_normal(adj[1:L], weights[1:L], num[1:N], tau.phi, zero_mean = 1)
    
    # Priors:
    alpha ~ dflat()                                       # vague uniform prior
    overallSIR <- exp(alpha)                              # overall SIR across study region
    
    tau.theta ~ dgamma(1, 0.01)                           # prior for the precision of theta
    sigma2.theta <- 1/tau.theta                           # variance of theta
    
    tau.phi ~ dgamma(1, 0.01)                             # prior for the precision of phi
    sigma2.phi <- 1/tau.phi                               # variance of phi
    
  }
  
)

In a similar way as done before you need to specify the data. The data that we are using is identical as for the ICAR so we will skip this.

  • Initial values:
inits <- list(
  list(alpha=0.01, 
       tau.theta=10, theta = rep(0.01,times=N), 
       tau.phi=10, phi = rep(0.01,times=N)),  # chain 1
  list(alpha=0.5, 
       tau.theta=1, theta = rep(-0.01,times=N), 
       tau.phi=1, phi = rep(-0.01,times=N))   # chain 2
)
  • Parameters to monitor:
# Monitored parameters
params <- c("sigma2.phi", "sigma2.theta", "overallSIR", "resSIR", "SIR", "e", "mu", "alpha", "theta", "phi")
  • Run the code:
BYMCodesamples <- nimbleMCMC(code = BYMCode,
                              data = ScotLipdata,
                              constants = ScotLipConsts, 
                              inits = inits,
                              monitors = params,
                              niter = ni,
                              nburnin = nb,
                              thin = nt, 
                              nchains = nc, 
                              setSeed = 9, 
                              progressBar = FALSE,     
                              samplesAsCodaMCMC = TRUE, 
                              summary = TRUE, 
                              WAIC = TRUE
)

Check convergence

  • Check the G-R diagnostic
GR.diag <- gelman.diag(BYMCodesamples$samples, multivariate = FALSE)
all(GR.diag$psrf[,"Point est."] < 1.1) 
## [1] TRUE
which(GR.diag$psrf[,"Point est."] > 1.1) 
## named integer(0)
  • Check sigma2.theta
BYM_ggmcmc <- ggs(BYMCodesamples$samples) 
BYM_ggmcmc %>% filter(Parameter == "sigma2.theta") %>% 
  ggs_traceplot + theme_bw() + theme(text = element_text(size = 12)) -> p1

BYM_ggmcmc %>% filter(Parameter == "sigma2.theta") %>% 
  ggs_autocorrelation + theme_bw() + theme(text = element_text(size = 12)) -> p2

p1/p2

Seems that we need to tune values given for the MCMC setting to get convergence for sigma2.theta (not done here).

and attach the BYM SIR on the initial shp

ScotLip$BYM_SIR <- BYMCodesamples$summary$all.chains[paste0("SIR[", 1:N, "]"), "Median"]

The BYM2 model

Model specification

BYM2 is a reparametrization of BYM model which has the following features: 1) It resolves the identifiability issues of the BYM model 2) The hyperparameters of the field are almost orthogonal, facilitating clear interpretation and making prior assignment intuitive, 3) The spatially structured random effect is scaled and thus the hyperprior used in one graph have the same interpretation in any other graph (for instance the hyperpriors for the graph of Scotland will have same interpretation as the hyperpriors for the graph of Greece only if the spatially structured random effect is scaled). For more information see Riebler et al. (2016) and Freni-Sterrantino, Ventrucci, and Rue (2018). \[ \begin{eqnarray} O_{i}|\lambda_{i} & \sim & \text{Poisson}(\lambda_{i}E_{i} ) \\ \log(\lambda_{i}) & = & \alpha + b_{i} \\ b_{i} & = & \frac{1}{\tau_b}\big(\theta\sqrt{1-\rho} + \phi\sqrt{\rho}\big)\\ \theta_{i} & \sim & N(0,1/\tau_{\theta}) \\ \phi_{i} & \sim & \text{ICAR}({\bf W^*}, 1) \end{eqnarray} \]

Code

The code is as follows:

BYM2Code <- nimbleCode(
  {
    for (i in 1:N){
      
      O[i] ~ dpois(mu[i])                                # Poisson likelihood for observed counts 
      log(mu[i]) <- log(E[i]) + alpha + b[i]
      
      b[i] <- (1/sqrt(tau.b))*(sqrt((1-rho))*theta[i] + sqrt(rho/scale)*phi[i])
      
      theta[i] ~ dnorm(0, tau = tau.theta)               # area-specific RE
      SIR[i] <- exp(alpha + b[i])                        # area-specific SIR
      resSIR[i] <- exp(b[i])                             # area-specific residual SIR
      e[i] <- (O[i]-mu[i])/sqrt(mu[i])                   # residuals      
    } 
    
    # ICAR
    phi[1:N] ~ dcar_normal(adj[1:L], weights[1:L], num[1:N], tau = 1, zero_mean = 1) # its scaled so tau = 1
    
    # Priors:
    # intercept
    alpha ~ dflat()                                      # vague uniform prior
    overallSIR <- exp(alpha)                             # overall SIR across study region
    
    # precision parameter of the reparametrization
    tau.b ~ dgamma(1, 0.01)                              # prior for the precision of b
    sigma2.b <- 1/tau.b                                  # the variance of b
    
    # precision parameter of theta
    tau.theta ~ dgamma(1, 0.01)                          # prior for the precision of theta
    sigma2.theta <- 1/tau.theta                          # the variance of theta
    
    # mixing parameter
    rho ~ dbeta(1, 1)                                    # prior for the mixing parameter
  }
  
)
  • We need to calculate the scale:
# Wards_nb is the spatial object created using the shapefile and the spdep package
W.scale <- nb2mat(Wards_nb, zero.policy = TRUE, style = "B")           
W.scale <- -W.scale
diag(W.scale) <- abs(apply(W.scale, 1, sum))
# solve(W.scale) # this should not work since by definition the matrix is singular

Q = inla.scale.model(W.scale, constr=list(A=matrix(1, nrow=1, ncol=nrow(W.scale)), e=0))
scale = exp((1/nrow(W.scale))*sum(log(1/diag(Q))))
  • Add the scale in the constants (the data file remains the same):
LipCancerConsts <-list(
  N = N,                    
  E = ScotLip$CEXP,                      
  
  # elements of neighboring matrix
  adj = nbWB_A$adj, 
  weights = nbWB_A$weights, 
  num = nbWB_A$num, 
  L = length(nbWB_A$weights), 
  
  # scale
  scale = scale
)
  • Set the initials:
inits <- list(
  list(alpha = 0.01, 
       tau.theta = 10, 
       theta = rep(0.01, times = N), 
       phi = rep(0.01, times = N), 
       tau.b = 10, 
       rho = .2),                        # chain 1
  list(alpha = 0.5, 
       tau.theta = 1, 
       theta = rep(-0.01 ,times = N), 
       phi = rep(-0.01, times = N), 
       tau.b = 1, 
       rho = .7)                         # chain 2
)
  • Set the parameters to monitor:
params <- c("sigma2.b", "sigma2.theta", "rho", "overallSIR", "resSIR", "SIR", "e", "mu", "alpha", "b", "theta")
  • Run the BYM2
BYM2Codesamples <- nimbleMCMC(code = BYM2Code,
                              data = ScotLipdata,
                              constants = LipCancerConsts, 
                              inits = inits,
                              monitors = params,
                              niter = ni,
                              nburnin = nb,
                              thin = nt, 
                              nchains = nc, 
                              setSeed = 9, 
                              progressBar = FALSE,     
                              samplesAsCodaMCMC = TRUE, 
                              summary = TRUE, 
                              WAIC = TRUE
)

Check convergence

  • Check the G-R diagnostic
GR.diag <- gelman.diag(BYM2Codesamples$samples, multivariate = FALSE)
all(GR.diag$psrf[,"Point est."] < 1.1) 
## [1] TRUE
which(GR.diag$psrf[,"Point est."] > 1.1) 
## named integer(0)

attached the SIR on the initial shp

ScotLip$BYM2_SIR <- BYM2Codesamples$summary$all.chains[paste0("SIR[", 1:N, "]"), "Median"]

Comparing results from the different models

In this section we will compare the output of the three different models. In this comparison we will focus on SIRs, exceedance probabilities and the different intercepts.

Smoothed SIRs

First, I will take some random samples of the SIRs and check if we are fine with convergence (keep in mind the the G-R test provided evidence towards convergence):

set.seed(9)
samples <- sample(x = 1:N, size = 5)
unstr_ggmcmc %>% filter(Parameter == paste0("SIR[", samples, "]")) %>% 
  ggs_traceplot + theme_bw() + theme(text = element_text(size = 12))

samples <- sample(x = 1:N, size = 5)
ICAR_ggmcmc %>% filter(Parameter == paste0("SIR[", samples, "]")) %>% 
  ggs_traceplot + theme_bw() + theme(text = element_text(size = 12))

samples <- sample(x = 1:N, size = 5)
BYM_ggmcmc %>% filter(Parameter == paste0("SIR[", samples, "]")) %>% 
  ggs_traceplot + theme_bw() + theme(text = element_text(size = 12))

BYM2_ggmcmc <- ggs(BYM2Codesamples$samples)
samples <- sample(x = 1:N, size = 5)
BYM2_ggmcmc %>% filter(Parameter == paste0("SIR[", samples, "]")) %>% 
  ggs_traceplot + theme_bw() + theme(text = element_text(size = 12))

Now we can start comparing

ggplot() + geom_sf(data = ScotLip, aes(fill = crudeSIR), col = "NA") + 
           scale_fill_viridis_c(limits = c(0, 6.6)) +  theme_bw() + 
           theme(axis.text = element_blank(), text = element_text(size = 10)) -> p1

ggplot() + geom_sf(data = ScotLip, aes(fill = unstr_SIR), col = "NA") + 
           scale_fill_viridis_c(limits = c(0, 6.6)) + theme_bw() + 
           theme(axis.text = element_blank(), text = element_text(size = 10)) -> p2

ggplot() + geom_sf(data = ScotLip, aes(fill = ICAR_SIR), col = "NA") + 
           scale_fill_viridis_c(limits = c(0, 6.6)) + theme_bw() + 
           theme(axis.text = element_blank(), text = element_text(size = 10)) -> p3

ggplot() + geom_sf(data = ScotLip, aes(fill = BYM_SIR), col = "NA") + 
           scale_fill_viridis_c(limits = c(0, 6.6)) + theme_bw() + 
           theme(axis.text = element_blank(), text = element_text(size = 10)) -> p4

ggplot() + geom_sf(data = ScotLip, aes(fill = BYM2_SIR), col = "NA") + 
           scale_fill_viridis_c(limits = c(0, 6.6)) + theme_bw() + 
           theme(axis.text = element_blank(), text = element_text(size = 10)) -> p5

(p1|p2|p3)/(p4|p5)

It is clear that there is some level of smoothness induced in the SIR using the different hierarchical Bayesian models. For the model with just the unstructured term, as expected we have a global but not local smoothing, whereas for the ICAR and the rest of the models, it is clear that the smoothing was also performed locally, using the adjacency structure. We can also assess the boxplots of the SIRs:

data.frame(model = c(rep("Crude", N), rep("Unstr", N), 
                     rep("ICAR", N), rep("BYM", N), rep("BYM2", N)),
                     SIR =c(ScotLip$crudeSIR, ScotLip$unstr_SIR, ScotLip$ICAR_SIR, 
                            ScotLip$BYM_SIR, ScotLip$BYM2_SIR)) %>% 
  
  ggplot() + geom_boxplot(aes(x = model, y = SIR)) + theme_bw() + 
             theme(text = element_text(size = 14))

  • Lets do some pairplots too:
suppressMessages(
  print(
ScotLip %>% select(crudeSIR, unstr_SIR, ICAR_SIR, BYM_SIR, BYM2_SIR) %>% 
            as.data.frame() %>% 
            select(-geometry) %>% 

  ggpairs(lower = list(continuous = wrap("points", size=.8))) + 
  theme_bw() + theme(text = element_text(size = 14))
  )
)

Posterior probabilities

  • We can also compare the posterior probabilities (\(\text{Pr}(\text{SIR}>1)\)):
ScotLip$ICAR_postProb <- sapply(paste0("SIR[", 1:N, "]"), 
                                function(X) mean(ICARCodesamples$samples$chain1[,X]>1))
ScotLip$BYM_postProb <- sapply(paste0("SIR[", 1:N, "]"), 
                               function(X) mean(BYMCodesamples$samples$chain1[,X]>1))
ScotLip$BYM2_postProb <- sapply(paste0("SIR[", 1:N, "]"), 
                                function(X) mean(BYM2Codesamples$samples$chain1[,X]>1))

ggplot() + geom_sf(data = ScotLip, aes(fill = unstr_postProb), col = "NA") + 
           scale_fill_viridis_c(limits = c(0, 1)) + theme_bw() + 
           theme(axis.text = element_blank(), text = element_text(size = 10)) -> p1

ggplot() + geom_sf(data = ScotLip, aes(fill = ICAR_postProb), col = "NA") + 
           scale_fill_viridis_c(limits = c(0, 1)) + theme_bw() + 
           theme(axis.text = element_blank(), text = element_text(size = 10)) -> p2

ggplot() + geom_sf(data = ScotLip, aes(fill = BYM_postProb), col = "NA") + 
           scale_fill_viridis_c(limits = c(0, 1)) + theme_bw() + 
           theme(axis.text = element_blank(), text = element_text(size = 10)) -> p3

ggplot() + geom_sf(data = ScotLip, aes(fill = BYM2_postProb), col = "NA") + 
           scale_fill_viridis_c(limits = c(0, 1)) + theme_bw() + 
           theme(axis.text = element_blank(), text = element_text(size = 10)) -> p4

(p1|p2)/(p3|p4)

Note that we can also calculate exceedance probabilities in the sampler using the step() function. For example in the BYM2, inside the for loop under the calculation of e, we can add this: proba.resSIR[i] <- step(resSIR[i]-1). Make sure you monitor proba.resSIR by specifying it in the params. After the sampler works we can take its posterior mean and get the posterior probability.

Intercept, hyperparameters and WAIC

  • For completeness I am also showing a table of the intercept and hyperparameters. Note that any comparison is conditional on convergence. More effort should be put to examine convergence and to tune the MCMC setting. The table presents the median posterior of the random variables together with the 95% credibility regions.
Median and 95%CrI for the intercept and hyperparameters of the models. The last row shows the WAIC for model comparison.
Unstr ICAR BYM BYM2
\(\alpha\) 0.08 (-0.16, 0.31) 0.09 (-0.01, 0.19) 0.09 (-0.02, 0.2) 0.09 (-0.01, 0.19)
\(\sigma^2_{\theta}\) 0.57 (0.35, 0.94)
0.01 (0, 0.07) 0.01 (0, 0.16)
\(\sigma^2_{\phi}\)
0.64 (0.35, 1.15) 0.61 (0.31, 1.13) 1
\(\sigma^2_{b}\)
0.37 (0.18, 1.19)
\(\rho\)
0.71 (0.23, 0.99)
WAIC 307.5 293.61 293.69 295.24

Some interesting features for the table above:

  1. The uncertainty of the intercept is larger for the model with the unstructured random effect.
  2. \(\sigma^2_{\theta}\) is almost negligible for BYM and BYM2.
  3. \(\rho\) is 0.71 for the BYM2 implying that large part of the variation is due to the structured spatial effect (but of course the uncertainty is large).
  4. The Watanabe–Akaike information criterion or WAIC (Watanabe 2013) suggests that the best fit model is the ICAR (the smaller the WAIC the better the fit).

Comparison of the latent fields

Conclusion

The purpose of this tutorial is to show how to fit spatial models using NIMBLE (de Valpine et al. 2017). Nevertheless, the tutorial does not provide an exhaustive list of spatial model that can be fit in NIMBLE. Examples of other models include the proper model (Cressie and Chan 1989), the Leroux model (Leroux, Lei, and Breslow 2000), see also Best, Richardson, and Thomson (2005) and Lee (2011) for simulation studies and relevant comparisons. Additional sources for code include the tutorial by Lawson (Lawson 2020) and also chapter 9 of the NIMBLE manual.

References

Besag, Julian, Jeremy York, and Annie Mollié. 1991. “Bayesian Image Restoration, with Two Applications in Spatial Statistics.” Annals of the Institute of Statistical Mathematics 43 (1): 1–20.

Best, Nicky, Sylvia Richardson, and Andrew Thomson. 2005. “A Comparison of Bayesian Spatial Models for Disease Mapping.” Statistical Methods in Medical Research 14 (1): 35–59.

Clayton, David, and John Kaldor. 1987. “Empirical Bayes Estimates of Age-Standardized Relative Risks for Use in Disease Mapping.” Biometrics, 671–81.

Cressie, Noel, and Ngai H Chan. 1989. “Spatial Modeling of Regional Variables.” Journal of the American Statistical Association 84 (406): 393–401.

de Valpine, P., D. Turek, C. J. Paciorek, C. Anderson-Bergman, D. Temple Lang, and R. Bodik. 2017. “Programming with Models: Writing Statistical Algorithms for General Model Structures with NIMBLE.” Journal of Computational and Graphical Statistics 26: 403–17. https://doi.org/10.1080/10618600.2016.1172487.

Freni-Sterrantino, Anna, Massimo Ventrucci, and Håvard Rue. 2018. “A Note on Intrinsic Conditional Autoregressive Models for Disconnected Graphs.” Spatial and Spatio-Temporal Epidemiology 26: 25–34.

Gelman, Andrew, Donald B Rubin, and others. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” Statistical Science 7 (4): 457–72.

Lawson, Andrew. 1999. Disease Mapping and Risk Assessment for Public Health. Wiley.

Lawson, Andrew B. 2020. “NIMBLE for Bayesian Disease Mapping.” Spatial and Spatio-Temporal Epidemiology, 100323.

Lee, Duncan. 2011. “A Comparison of Conditional Autoregressive Models Used in Bayesian Disease Mapping.” Spatial and Spatio-Temporal Epidemiology 2 (2): 79–89.

Leroux, Brian G, Xingye Lei, and Norman Breslow. 2000. “Estimation of Disease Rates in Small Areas: A New Mixed Model for Spatial Dependence.” In Statistical Models in Epidemiology, the Environment, and Clinical Trials, 179–91. Springer.

Riebler, Andrea, Sigrunn H Sørbye, Daniel Simpson, and Håvard Rue. 2016. “An Intuitive Bayesian Spatial Model for Disease Mapping That Accounts for Scaling.” Statistical Methods in Medical Research 25 (4): 1145–65.

Watanabe, Sumio. 2013. “A Widely Applicable Bayesian Information Criterion.” Journal of Machine Learning Research 14 (Mar): 867–97.