This practical requires the following packages to be installed and
attached: sf
, dplyr
, tidyverse
,
nimble
, coda
, spdep
,
patchwork
, fastDummier
, ggmcmc
,
gplots
and INLA
.
To check where the packages are saved type
.libPaths()
To check if already installed type
.packages(all.available = TRUE)
. As seen on the previous
practical, you can type is.element(“PACKAGE_NAME”,
installed.packages()
) to check if the individual package
“PACKAGE_NAME” is installed.
If the packages are not installed you need to do it through:
install.packages(c("sf","dplyr", "tidyverse", "nimble", "coda",
"spdep", "patchwork", "fastDummies", "ggmcmc", "gplot", "kableExtra"),
type = "both",dependencies = TRUE,
repos = c(CRAN = "http://cran.r-project.org"))
INLA
, you can useinstall.packages("INLA",repos=c(getOption("repos"),
INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)
Then you need to load these through:
library(sf) # Simple feature for R
library(dplyr) # A package for data manipulation
library(tidyverse) # A package for data manipulation and plotting
library(nimble) # A package for performing MCMC in R
library(coda) # A package for summarizing and plotting of MCMC output
# and diagnostic tests
library(spdep) # A package to calculate neighbors
library(patchwork) # A package to combine plots
library(ggmcmc) # MCMC diagnostics with ggplot
library(INLA) # A package for quick Bayesian inference (here used to calculate
# the scale for BYM2)
library(fastDummies) # A package to create dummies from a factor
library(gplots) # A package to create CrI graphs
library(kableExtra) # package for formatting tables in rmarkdown
In this document we will perform ecological regression using
NIMBLE
(de Valpine et al.
2017). We will fit the BYM2 model (Riebler
et al. 2016) and examine the effect of long term exposure to
air-pollution on COVID-19 mortality in England. Long-term exposure to
air-pollution has been hypothesised to worsen COVID-19 prognosis: either
directly, as it can suppress early immune responses to the infection, or
indirectly, as it can increase the risk of stroke, hypertension and
other pre-existing conditions (Konstantinoudis et
al. 2020). We will use data openly available on COVID-19
mortality that can be downloaded from ONS.
This data covers England during March and July 2020. Long term exposure
to air-pollution was determined as the averaged NO\(_2\) concentration during 2014-2018 as
retrieved from PCM
model. We will consider total number of intensive care unit (ICU)
beds as of Ferbuary 2020 NHS
and socio-economic deprivation in 2019 (IMD)
as potential confounders. The data has been cleaned and stored in a
.shp
format.
read_sf()
function!COVID19eco <- read_sf("COVIDecoregression.shp")
head(COVID19eco)
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 344666.1 ymin: 378867 xmax: 478448.5 ymax: 537152
## Projected CRS: OSGB36 / British National Grid
## # A tibble: 6 × 7
## LTLA deaths expectd TtlICUB NO2 IMD geometry
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <MULTIPOLYGON [m]>
## 1 E06000001 110 78.8 0.00995 12.9 1 (((447213.9 537036.1, 447228.8 5…
## 2 E06000002 206 103. 0.0392 16.4 1 (((448609.9 521982.6, 448616.8 5…
## 3 E06000003 134 131. 0.0412 11.8 1 (((455932.3 527880.7, 455919.4 5…
## 4 E06000004 155 157. 0.00974 13.7 2 (((444157 527956.3, 444165.9 527…
## 5 E06000005 90 97.7 0.0128 11.6 2 (((423496.6 524724.3, 423497.2 5…
## 6 E06000006 128 93.0 0 17.1 1 (((358374.7 384376.8, 358367.7 3…
par(mar = c(0,0,0,0))
plot(COVID19eco$geometry)
LTLA_nb <- poly2nb(COVID19eco)
summary(LTLA_nb)
## Neighbour list object:
## Number of regions: 316
## Number of nonzero links: 1610
## Percentage nonzero weights: 1.612322
## Average number of links: 5.094937
## 1 region with no links:
## 44
## 2 disjoint connected subgraphs
## Link number distribution:
##
## 0 1 2 3 4 5 6 7 8 9 10 11
## 1 8 25 39 54 58 46 49 21 10 3 2
## 8 least connected regions:
## 10 26 60 66 87 88 114 262 with 1 link
## 2 most connected regions:
## 50 226 with 11 links
# convert in the format NIMBLE wants
nbWB_B <- nb2WB(LTLA_nb)
names(nbWB_B)
## [1] "adj" "weights" "num"
summary(nbWB_B)
## Length Class Mode
## adj 1610 -none- numeric
## weights 1610 -none- numeric
## num 316 -none- numeric
The variable IMD is a categorical one. To make it compatible with the
NIBLE format required for categorical variables we will use the function
dummy_cols
in the fastDummies
package.
fastDummies::dummy_cols(COVID19eco$IMD) %>% as_tibble() %>%
select(.data_1:.data_5) %>%
rename(IMD_1 = .data_1, IMD_2 = .data_2, IMD_3 = .data_3,
IMD_4 = .data_4, IMD_5 = .data_5) -> dummies
COVID19eco <- cbind(COVID19eco, dummies)
summary(COVID19eco)
## LTLA deaths expectd TtlICUB
## Length:316 Min. : 4.0 Min. : 8.287 Min. :0.000e+00
## Class :character 1st Qu.: 77.0 1st Qu.: 99.339 1st Qu.:8.548e-05
## Mode :character Median : 117.5 Median :129.067 Median :8.092e-03
## Mean : 155.6 Mean :155.630 Mean :1.399e-02
## 3rd Qu.: 191.2 3rd Qu.:177.896 3rd Qu.:1.460e-02
## Max. :1226.0 Max. :903.219 Max. :1.911e-01
## NO2 IMD IMD_1 IMD_2
## Min. : 4.48 Min. :1.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:11.09 1st Qu.:2.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :13.96 Median :3.000 Median :0.0000 Median :0.0000
## Mean :15.05 Mean :2.997 Mean :0.2025 Mean :0.1994
## 3rd Qu.:17.26 3rd Qu.:4.000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :46.45 Max. :5.000 Max. :1.0000 Max. :1.0000
## IMD_3 IMD_4 IMD_5 geometry
## Min. :0.0000 Min. :0.0000 Min. :0.0000 MULTIPOLYGON :316
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 epsg:27700 : 0
## Median :0.0000 Median :0.0000 Median :0.0000 +proj=tmer...: 0
## Mean :0.1994 Mean :0.1962 Mean :0.2025
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000
ggplot() + geom_boxplot(data = COVID19eco, aes(y = deaths)) +
theme_bw() + ylab("") + ggtitle("Deaths") -> p1
ggplot() + geom_boxplot(data = COVID19eco, aes(y = expectd)) +
theme_bw() + ylab("") + ggtitle("Expected") -> p2
ggplot() + geom_boxplot(data = COVID19eco, aes(y = TtlICUB)) +
theme_bw() + ylab("") + ggtitle("ICU beds") -> p3
ggplot() + geom_boxplot(data = COVID19eco, aes(y = NO2)) +
theme_bw() + ylab("") + ggtitle(expression(NO[2])) -> p4
ggplot() + geom_boxplot(data = COVID19eco, aes(y = IMD)) +
theme_bw() + ylab("") + ggtitle("IMD") -> p5
(p1|p2)/(p3|p4|p5)
ggplot() + geom_sf(data = COVID19eco, aes(fill = deaths), col = NA) +
theme_bw() + ggtitle("Deaths") + scale_fill_viridis_c() -> p1
ggplot() + geom_sf(data = COVID19eco, aes(fill = expectd), col = NA) +
theme_bw() + ggtitle("Expected") + scale_fill_viridis_c() -> p2
ggplot() + geom_sf(data = COVID19eco, aes(fill = TtlICUB), col = NA) +
theme_bw() + ggtitle("ICU beds") + scale_fill_viridis_c() -> p3
ggplot() + geom_sf(data = COVID19eco, aes(fill = NO2), col = NA) +
theme_bw() + ggtitle(expression(NO[2])) + scale_fill_viridis_c() -> p4
ggplot() + geom_sf(data = COVID19eco, aes(fill = IMD), col = NA) +
theme_bw() + ggtitle("IMD") + scale_fill_viridis_c() -> p5
(p1|p2)/(p3|p4|p5)
ggplot() + geom_point(data = COVID19eco, aes(x = NO2, y = TtlICUB)) +
theme_bw() + annotate("text", x = 10, y = 0.17, size = 6,
label = paste0("cor = ", round(cor(COVID19eco$NO2,
COVID19eco$TtlICUB),
digits = 2))) -> p1
p1
COVID19eco$crudeSMR <- COVID19eco$deaths/COVID19eco$expectd
ggplot() + geom_boxplot(data = COVID19eco, aes(y = crudeSMR)) +
theme_bw() + ggtitle("") + ylab("") -> p1
ggplot() + geom_sf(data = COVID19eco, aes(fill = crudeSMR), col = NA) +
theme_bw() + ggtitle("") + scale_fill_viridis_c() -> p2
(p1|p2) + plot_annotation(title = "Crude SMR")
We will fit a Poisson log-linear model in the Bayesian framework to quantify the effect of NO\(_2\), while adjusting for ICU beds and deprivation. To model the spatial autocorrelation we will use the BYM2 prior (Riebler et al. 2016), which is an attractive extension of the BYM model (Besag, York, and Mollié 1991). Let \(i=1,...N\) be the index for the LTLAs in England, \(Y_i\) be the number of COVID-19 deaths, \(E_i\) the expected number of COVID-19 death, \(X_1\) the NO\(_2\) concentration, \(X_2\) the number of ICU beds per population, and \(X_{3ji}\) the dummy variables for deprivation with \(j=2,...,5\):
\[\begin{equation} \begin{aligned} \hbox{O}_i & \sim \hbox{Poisson}(E_i \lambda_i); \;\;\; i=1,...,N\\ \log \lambda_i & = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \sum_{j=2}^5\beta_{3j} X_{3ji} + b_{i} \\ b_{i} & = \frac{1}{\sqrt{\tau_b}}\big(\theta\sqrt{1-\rho} + \phi\sqrt{\rho}\big)\\ \theta_{i} & \sim N(0,1) \\ \phi_{i} & \sim \text{ICAR}({\bf W^*}, 1)\\ \end{aligned} \end{equation}\]
where \(\alpha\) is an intercept term, \(\mathbf{\beta}\) a vector of coefficients, \(\theta\) the spatially unstructured random effect, \(\phi\) the scaled spatially structured random effect, \(\rho\) a mixing parameter, \(tau_b\) the precision of the field.
BYMecoCode <- nimbleCode(
{
for (i in 1:N){
O[i] ~ dpois(mu[i]) # Poisson likelihood
log(mu[i]) <- log(E[i]) + inprod(beta[], X[i,]) + b[i]
# of course you can write this explicitly too:
# log(mu[i]) <- log(E[i]) + alpha + b[i] +
# beta1*X1[i] + beta2*X2[i] +
# beta32*X32[i] + beta33*X33[i] +
# beta34*X34[i] + beta35*X35[i]
b[i] <- (1/sqrt(tau.b))*(sqrt((1-rho))*theta[i] +
sqrt(rho/scale)*phi[i])
theta[i] ~ dnorm(0, tau = 1) # area-specific RE
SMR[i] <- exp(b[1] + b[i]) # area-specific SIR
resSMR[i] <- exp(b[i]) # area-specific residual SIR
e[i] <- (O[i]-mu[i])/sqrt(mu[i]) # residuals
proba.resSMR[i]<-step(resSMR[i]-1) # Posterior probability
}
# ICAR prior
phi[1:N] ~ dcar_normal(adj[1:L], weights[1:L],
num[1:N], tau = 1, zero_mean = 1) # its scaled so tau = 1
# Priors
beta[1] ~ dflat() # vague prior (Unif(-inf, +inf))
overallRR <- exp(beta[1])
# precision parameter of the reparametrisation
tau.b ~ dgamma(1, 0.01) # prior for the precision of b
sigma2.b <- 1/tau.b # the variance of b
# mixing parameter
rho ~ dbeta(1, 1) # prior for the mixing parameter
# priors for the fixed effects
for(j in 1:(K-1)){
beta[j+1] ~ dnorm(0, tau = 1)
RR.beta[j] <- exp(beta[j+1])
}
if(quin){
}else{
beta.no2.unscaled <- beta[2]/sd.no2
rr.no2.unscaled <- exp(beta.no2.unscaled)
RR.beta1_5NO2 <- exp(beta.no2.unscaled * 5) # this will be explained later
}
}
)
W.scale <- nb2mat(LTLA_nb, zero.policy = TRUE, style = "B")
W.scale <- -W.scale
diag(W.scale) <- abs(apply(W.scale, 1, sum))
# solve(W.scale) # this should not work since by definition the matrix is singular
Q = inla.scale.model(W.scale, constr=list(A=matrix(1, nrow=1, ncol=nrow(W.scale)), e=0))
scale = exp((1/nrow(W.scale))*sum(log(1/diag(Q))))
n.LTLA <- dim(COVID19eco)[1]
# Format the data for NIMBLE in a list
COVIDdata = list(
O = COVID19eco$deaths, # observed nb of deaths
# covariates
X = cbind(1, # for the intercept
scale(COVID19eco$NO2), # NO2
scale(COVID19eco$TtlICUB)[,1], # ICU beds
COVID19eco$IMD_2, # IMD 2
COVID19eco$IMD_3, # IMD 3
COVID19eco$IMD_4, # IMD 4
COVID19eco$IMD_5) # IMD 5
)
COVIDConsts <-list(
N = n.LTLA, # nb of LTLAs
# adjacency matrix
L = length(nbWB_B$weights), # the number of neighboring areas
E = COVID19eco$expectd, # expected number of deaths
adj = nbWB_B$adj, # the elements of the neigh. matrix
num = nbWB_B$num,
weights = nbWB_B$weights,
scale = scale, # the scale for the covariance
K = 7, # the total number of covariates
sd.no2 = sd(COVID19eco$NO2),
quin = FALSE
)
inits <- list(
list(
# first chain
tau.b = 1,
theta = rep(0.01, times = n.LTLA),
phi = rep(0.01, times = n.LTLA),
rho = 0.6,
beta = rep(0, 7)
),
#chain 2
list(
tau.b = 3,
theta = rep(0.1, times = n.LTLA),
phi = rep(0.1, times = n.LTLA),
rho = 0.9,
beta = c(-2,-2, rep(1, 5))
)
)
parameters = c("resSMR", "proba.resSMR", "SMR", "RR.beta[1]",
"RR.beta[2]", "RR.beta[3]", "RR.beta[4]", "RR.beta[5]", "RR.beta[6]",
"overallRR", "sigma2.b", "mu", "rho", "beta.no2.unscaled", "rr.no2.unscaled", "RR.beta1_5NO2")
ni <- 500000 # nb iterations
nt <- 250 # thinning interval
nb <- 250000 # nb iterations as burn-in
nc <- 2 # nb chains
nimbleMCMC
:t0<- Sys.time()
modelBYM.eco <- nimbleMCMC(code = BYMecoCode,
data = COVIDdata,
constants = COVIDConsts,
inits = inits,
monitors = parameters,
niter = ni,
nburnin = nb,
thin = nt,
nchains = nc,
setSeed = 9,
progressBar = TRUE,
samplesAsCodaMCMC = TRUE,
summary = TRUE,
WAIC = TRUE
)
t1<- Sys.time()
t1 - t0
GR.diag <- gelman.diag(modelBYM.eco$samples, multivariate = FALSE)
all(GR.diag$psrf[,"Point est."] < 1.1)
## [1] FALSE
which(GR.diag$psrf[,"Point est."] > 1.1)
## proba.resSMR[1] proba.resSMR[6] proba.resSMR[13] proba.resSMR[31]
## 642 647 654 672
## proba.resSMR[74] proba.resSMR[104] proba.resSMR[118] proba.resSMR[134]
## 715 745 759 775
## proba.resSMR[173] proba.resSMR[174] proba.resSMR[179] proba.resSMR[182]
## 814 815 820 823
## proba.resSMR[187] proba.resSMR[196] proba.resSMR[221] proba.resSMR[256]
## 828 837 862 897
## proba.resSMR[284] proba.resSMR[289] proba.resSMR[290]
## 925 930 931
ggs_BYMeco <- ggs(modelBYM.eco$samples)
ggs_BYMeco %>% filter(Parameter %in% c("RR.beta[1]",
"RR.beta[2]", "RR.beta[3]", "RR.beta[4]",
"RR.beta[5]", "RR.beta[6]")) %>%
ggs_traceplot() + theme_bw()
ggs_BYMeco %>% filter(Parameter %in% c("beta.no2.unscaled",
"rr.no2.unscaled", "RR.beta1_5NO2")) %>%
ggs_traceplot() + theme_bw()
ggs_BYMeco %>% filter(Parameter %in% c("RR.beta[1]",
"RR.beta[2]", "RR.beta[3]", "RR.beta[4]",
"RR.beta[5]", "RR.beta[6]")) %>%
ggs_running() + theme_bw()
modelBYM.eco$summary$all.chains[c("rr.no2.unscaled",
"RR.beta[2]", "RR.beta[3]", "RR.beta[4]",
"RR.beta[5]", "RR.beta[6]"),]
modelBYM.eco$summary$all.chains["rr.no2.unscaled",]
ggs_BYMeco %>% filter(Parameter %in% c("RR.beta[1]",
"RR.beta[2]", "RR.beta[3]", "RR.beta[4]",
"RR.beta[5]", "RR.beta[6]")) %>%
ggs_compare_partial() + theme_bw()
COVID19eco$resRR <-
modelBYM.eco$summary$all.chains[paste0("resSMR[", 1:n.LTLA, "]"), "Mean"]
COVID19eco$PostProb <-
modelBYM.eco$summary$all.chains[paste0("proba.resSMR[", 1:n.LTLA, "]"), "Mean"]
ggplot() + geom_sf(data = COVID19eco, aes(fill=resRR), col = NA) + ggtitle("Residual SMR") +
scale_fill_viridis_c() + theme_bw() -> p1
ggplot() + geom_sf(data = COVID19eco, aes(fill=PostProb), col = NA) + ggtitle("Pr(resSMR>1)") +
scale_fill_viridis_c() + theme_bw() -> p2
p1|p2
tab <-
round(
modelBYM.eco$summary$all.chains[c("rr.no2.unscaled",
"RR.beta[2]", "RR.beta[3]", "RR.beta[4]",
"RR.beta[5]", "RR.beta[6]",
"sigma2.b", "rho"), c("Median", "95%CI_low", "95%CI_upp")],
digits = 2
)
rownames(tab) <- c("NO$_2$", "ICUBeds", "IMD2", "IMD3", "IMD4", "IMD5",
"$\\sigma^2_{b}$", "$\\rho$")
knitr::kable(tab, caption = "Median and 95%CrI for the covariate coefficients and hyperparameters of the models.") %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "center")
Median | 95%CI_low | 95%CI_upp | |
---|---|---|---|
NO\(_2\) | 1.03 | 1.02 | 1.04 |
ICUBeds | 0.96 | 0.93 | 0.99 |
IMD2 | 0.92 | 0.83 | 1.02 |
IMD3 | 0.85 | 0.77 | 0.94 |
IMD4 | 0.90 | 0.81 | 1.00 |
IMD5 | 0.85 | 0.76 | 0.95 |
\(\sigma^2_{b}\) | 0.08 | 0.06 | 0.09 |
\(\rho\) | 0.93 | 0.79 | 1.00 |
Based on the table above, we see that for every unit increase in the
long term exposure to NO\(_2\), the
COVID-19 mortality increases by 0.03% (95% CrI: 0.02% to 0.04%). Number
of ICU beds per population seems to be protective, ie the more the beds
in one particular LTLA the less likely is someone living in this LTLA to
die from COVID-19: for every 1sd increase in the number of beds per
population (it is sd because we scale
the ICU beds before
we put them in the model, in addition 1sd = 2 beds per 100 people). To
quantify this, for every 2 beds per 100 people more, there is a -0.04%
(95% CrI: -0.07% to -0.01%) decrease in the COVID-19 mortality. The only
level of IMD that seems to be associated with COVID-19 mortality is the
5-th: the COVID19 mortality rate for people living in the least deprived
areas is 0.85 (95%CrI: 0.76 to 0.95) times the one for people living the
the most deprived areas.
With respect to the hyperparameters, we see that the unstructured spatial term has very low variance, thus we could consider exclude it from the model. The mixing parameter is 0.93, and although the uncertainty is high, we see that 93% of the total variation captured is attributed to the spatially structured term.
It would be of interest for some, to get the effect of air pollution
to COVID-19 mortality for every \(k\)
units of increase in NO\(_2\). That
could be the case when we want results directly comparable with a
specific analysis in the literature. To do so we add this line in the
model: RR.beta2_kNO2 <- exp(beta[3] * k)
and we remember
to add it in the parameters to be monitored. To show this explicitly,
say we want to find the effect for an increase of 5\(\mu g/m^3\) in the long term exposure to
NO\(_2\). We then add
RR.beta1_5NO2 <- exp(beta[3] * 5)
this in the model and
monitor it:
round(
modelBYM.eco$summary$all.chains[c("RR.beta1_5NO2"), c("Median", "95%CI_low", "95%CI_upp")],
digits = 2
)
## Median 95%CI_low 95%CI_upp
## 1.15 1.09 1.22
We will finish this tutorial by saying that the modeling approach we selected for far to model the effect of NO\(_2\) on COVID-19 related mortality is a bit restrictive, since we assumed that such an effect is linear. To relax this assumption we could consider several different things. In this tutorial, we will make the continuous variable NO\(_2\) categorical and we will see how valid it the assumption of linearity. Although this approach have some advantages, ie it straight-forward to do and the effect of some categories is easier to communicate plus is might be of interest in a particular application, there are other ways of allowing flexible fits (without categorizing), such as Gaussian processes and/or splines. This is the focus of another tutorial.
COVID19eco$NO2_quintiles <- cut(COVID19eco$NO2,
breaks = quantile(COVID19eco$NO2,
probs = seq(from = 0, to = 1, length.out = 6)),
labels = 1:5, include.lowest = T)
fastDummies::dummy_cols(COVID19eco$NO2_quintiles) %>%
select(.data_1:.data_5) %>%
rename(NO2_1 = .data_1, NO2_2 = .data_2, NO2_3 = .data_3, NO2_4 = .data_4, NO2_5 = .data_5) ->
dummies
COVID19eco <- cbind(COVID19eco, dummies)
# Format the data for NIMBLE in a list
COVIDdata = list(
O = COVID19eco$deaths, # observed nb of deaths
# covariates
X = cbind(1, # for the intercept
COVID19eco$NO2_2, # NO2 2
COVID19eco$NO2_3, # NO2 3
COVID19eco$NO2_4, # NO2 4
COVID19eco$NO2_5, # NO2 5
scale(COVID19eco$TtlICUB)[,1], # ICU beds
COVID19eco$IMD_2, # IMD 2
COVID19eco$IMD_3, # IMD 3
COVID19eco$IMD_4, # IMD 4
COVID19eco$IMD_5) # IMD 5
)
COVIDConsts <-list(
N = n.LTLA, # nb of LTLAs
# adjacency matrix
L = length(nbWB_B$weights), # the number of neighboring areas
E = COVID19eco$expectd, # expected number of deaths
adj = nbWB_B$adj, # the elements of the neigh. matrix
num = nbWB_B$num,
weights = nbWB_B$weights,
scale = scale, # the scale for the covariance matrix
K = 10, # the total number of covariates
quin = TRUE
)
inits <- list(
list(
# first chain
tau.b = 1,
theta = rep(0.01, times = n.LTLA),
phi = rep(0.01, times = n.LTLA),
rho = 0.5,
beta = rep(0, 10)
),
#chain 2 (different values from chain 1)
list(
tau.b = 5,
theta = rep(0.1, times = n.LTLA),
phi = rep(0.1, times = n.LTLA),
rho = 0.8,
beta = c(rep(-2, 5), rep(1, 5))
)
)
parameters = paste0("RR.beta[", 1:9, "]")
t0<- Sys.time()
modelBYM.eco_q <- nimbleMCMC(code = BYMecoCode,
data = COVIDdata,
constants = COVIDConsts,
inits = inits,
monitors = parameters,
niter = ni,
nburnin = nb,
thin = nt,
nchains = nc,
setSeed = 9,
progressBar = TRUE,
samplesAsCodaMCMC = TRUE,
summary = TRUE
)
t1<- Sys.time()
t1 - t0
# saveRDS(modelBYM.eco_q, file = "modelBYM.eco_q")
modelBYM.eco_q <- readRDS("modelBYM.eco_q")
ggs(modelBYM.eco_q$samples) %>% filter(Parameter %in% paste0("RR.beta[", 1:4, "]")) %>%
ggs_traceplot() + theme_bw()
rbind(
c(1, NA, NA),
round(
modelBYM.eco_q$summary$all.chains[paste0("RR.beta[", 1:4, "]"),
c("Median", "95%CI_low", "95%CI_upp")],
digits = 2
)
) -> tab
rownames(tab) <- levels(cut(COVID19eco$NO2,
breaks = quantile(COVID19eco$NO2,
probs = seq(from = 0, to = 1, length.out = 6)),
include.lowest = T))
options(knitr.kable.NA = "-")
knitr::kable(tab,
caption = "Median and 95%CrI for the NO$_2$ coefficients based on quintiles.") %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "center")
Median | 95%CI_low | 95%CI_upp | |
---|---|---|---|
[4.48,10.7] | 1.00 |
|
|
(10.7,13.1] | 1.19 | 1.09 | 1.31 |
(13.1,15.4] | 1.20 | 1.08 | 1.33 |
(15.4,18.4] | 1.37 | 1.22 | 1.53 |
(18.4,46.4] | 1.53 | 1.32 | 1.79 |
plotCI(y = tab[,1],
x = seq(from = 1, to = 3, length.out = 5),
ui = tab[,3],
li = tab[,2],
ylim = c(.5, 2), xlim = c(0.4, 3.5),
lwd = 2, type = "p", pch = 15, cex = 1, las = 1,
ylab = "", main = "", xaxt = "n", xlab = "", cex.axis = 1, bty = "n"
)
axis(side = 1, at = seq(from = 1, to = 3, length.out = 5),
labels = levels(cut(COVID19eco$NO2,
breaks = quantile(COVID19eco$NO2,
probs = seq(from = 0, to = 1, length.out = 6)),
include.lowest = T)),
cex.axis = .8, las = 2)
grid()
abline(h = 1, col = "red", lty = "dashed", lwd = 1.5)
mtext(expression(bold("NO"[2])), side = 3, cex = 1.1, line = 0, font = 2)
mtext("Posterior Relative Risk", side = 2, cex = 1.1, line = 3,font = 1)
The linearity assumption looks reasonable.
In this practical we saw an example of ecological regression with NIMBLE. We found a positive association between long term averaged NO\(_2\) exposure and COVID-19 mortality after adjusting for total number of ICU beds per population and deprivation. The interpretation of the results are conditional on the convergence of the coefficients, and for some of them it seems that we should have put more effort and tune the MCMC setting in a better way to achieve convergence (for instance there seems to be insufficient mixing for the linear NO\(_2\) coefficient).