In this document we will show how to deal with missing values in a Bayesian framework using NIMBLE
(de Valpine et al. 2017). We will use the Chicago dataset as it can be found in the dlnm
package. Using typing help(chicagoNMMAPS)
you will find the following information about the dataset:
The data set contains daily mortality (all causes, CVD, respiratory), weather (temperature, dew point temperature, relative humidity) and pollution data (PM10 and ozone) for Chicago in the period 1987-2000 from the National Morbidity, Mortality and Air Pollution Study (NMMAPS).
The purpose of the tutorial is to assess the effect of temperature on the number of deaths of all causes, using the Chicago dataset. At this analysis we are not interested in modelling the lag effect.
This practical requires the following packages to be installed and attached: sf
, dplyr
, tidyverse
, nimble
, coda
, spdep
, patchwork
, GGally
, ggmcmc
, kableExtra
, splines2
and dlnm
.
install.packages(c("sf", "dplyr", "tidyverse", "nimble", "coda", "spdep", "patchwork",
"GGally", "ggmcmc", "dlnm", "splines2", "kableExtra"), dependencies = TRUE, repos = "http://cran.r-project.org")
library(sf) # Simple feature for R
library(dplyr) # A package for data manipulation
library(tidyverse) # A package for data manipulation and plotting
library(nimble) # A package for performing MCMC in R
library(coda) # A package for summarizing and plotting of MCMC output
# and diagnostic tests
library(spdep) # A package to calculate neighbors
library(patchwork) # A package to combine plots
library(GGally) # A package to do pairplots
library(ggmcmc) # MCMC diagnostics with ggplot
library(dlnm) # A package for performing dlnms in R
library(splines2) # A package to fit splines in R
library(kableExtra) # packages for formatting tables in rmarkdown
library(fastDummies) # Library to create dummies from a string
head(chicagoNMMAPS, 10)
## date time year month doy dow death cvd resp temp dptp
## 1 1987-01-01 1 1987 1 1 Thursday 130 65 13 -0.2777778 31.500
## 2 1987-01-02 2 1987 1 2 Friday 150 73 14 0.5555556 29.875
## 3 1987-01-03 3 1987 1 3 Saturday 101 43 11 0.5555556 27.375
## 4 1987-01-04 4 1987 1 4 Sunday 135 72 7 -1.6666667 28.625
## 5 1987-01-05 5 1987 1 5 Monday 126 64 12 0.0000000 28.875
## 6 1987-01-06 6 1987 1 6 Tuesday 130 63 12 4.4444444 35.125
## 7 1987-01-07 7 1987 1 7 Wednesday 129 72 12 1.3888889 26.750
## 8 1987-01-08 8 1987 1 8 Thursday 109 51 13 -1.6666667 22.000
## 9 1987-01-09 9 1987 1 9 Friday 125 62 7 -3.0555556 29.000
## 10 1987-01-10 10 1987 1 10 Saturday 153 90 11 0.2777778 27.750
## rhum pm10 o3
## 1 95.500 26.95607 4.376079
## 2 88.250 NA 4.929803
## 3 89.500 32.83869 3.751079
## 4 84.500 39.95607 4.292746
## 5 74.500 NA 4.751079
## 6 77.375 40.95607 6.334412
## 7 74.500 33.95607 8.594019
## 8 77.875 28.95607 11.797890
## 9 95.125 32.34877 3.876079
## 10 81.875 NA 5.388137
Please notice the missing values.
summary(chicagoNMMAPS)
## date time year month
## Min. :1987-01-01 Min. : 1 Min. :1987 Min. : 1.000
## 1st Qu.:1990-07-02 1st Qu.:1279 1st Qu.:1990 1st Qu.: 4.000
## Median :1993-12-31 Median :2558 Median :1994 Median : 7.000
## Mean :1993-12-31 Mean :2558 Mean :1994 Mean : 6.522
## 3rd Qu.:1997-07-01 3rd Qu.:3836 3rd Qu.:1997 3rd Qu.:10.000
## Max. :2000-12-31 Max. :5114 Max. :2000 Max. :12.000
##
## doy dow death cvd
## Min. : 1.0 Sunday :731 Min. : 69.0 Min. : 25.0
## 1st Qu.: 92.0 Monday :730 1st Qu.:105.0 1st Qu.: 44.0
## Median :183.0 Tuesday :730 Median :114.0 Median : 50.0
## Mean :183.1 Wednesday:730 Mean :115.4 Mean : 50.9
## 3rd Qu.:274.0 Thursday :731 3rd Qu.:124.0 3rd Qu.: 57.0
## Max. :366.0 Friday :731 Max. :411.0 Max. :312.0
## Saturday :731
## resp temp dptp rhum
## Min. : 0.000 Min. :-26.667 Min. :-25.62 Min. : 25.62
## 1st Qu.: 7.000 1st Qu.: 1.667 1st Qu.: 27.25 1st Qu.: 58.12
## Median : 9.000 Median : 10.556 Median : 39.75 Median : 69.75
## Mean : 9.178 Mean : 10.107 Mean : 40.41 Mean : 69.17
## 3rd Qu.:11.000 3rd Qu.: 19.444 3rd Qu.: 55.75 3rd Qu.: 80.38
## Max. :34.000 Max. : 33.333 Max. : 78.25 Max. :100.00
## NA's :1096
## pm10 o3
## Min. : -3.05 Min. :-0.832
## 1st Qu.: 20.77 1st Qu.:11.960
## Median : 30.25 Median :19.045
## Mean : 33.74 Mean :20.143
## 3rd Qu.: 42.42 3rd Qu.:26.624
## Max. :356.18 Max. :65.808
## NA's :251
There are 1096 missing values for the mean daily relative humidity and 251 for PM\(_10\).
We are fitting a model with a Poisson as the likelihood defined as: \[ \begin{eqnarray} O_{t} & \sim & \text{Poisson}(\lambda_{t}) \\ \log(\lambda_{t}) & = & \alpha + \sum_j \beta_j\text{dow}_j + \gamma\text{PM}_t + \xi_t + w_t \\ \xi_t, w_t & \sim & RW1(\tau^{-1}) \\ \alpha, {\bf \beta}, \gamma & \sim & N(0, 10) \\ 1/\tau & \sim & \text{Gamma}(1,1) \end{eqnarray} \]
where \(\alpha\) is an intercept term, \(\beta_j\) covariates for the effect of the days, \(\gamma\) a linear term for PM, \(\xi_t\) a temporal trend, and \(w_t\) a smooth function for the effect of temperature.
CompleteCase.code <- nimbleCode(
{
for (t in 1:Total){
O[t] ~ dpois(mu[t])
log(mu[t]) <- alpha + inprod(beta[1:6], X[t, 1:6]) + gamma*PM[t] + xi[time_cat[t]] +
w[tmp.ord[t]]
}
# intrinsic CAR prior on the effect of time
xi[1:N] ~ dcar_normal(adj2[1:M], weights2[1:M], num2[1:N], tau.xi, c = 2, zero_mean = 1)
# intrinsic CAR prior on the effect of temperature
w[1:K] ~ dcar_normal(adj[1:L], weights[1:L], num[1:K], tau.w, c = 2, zero_mean = 1)
# Priors:
alpha ~ dflat() # Unif(-inf, +inf)
exp.alpha <- exp(alpha) # overall counts across time period
for(j in 1:6){
beta[j] ~ dnorm(0, tau = 5)
exp.beta[j] <- exp(beta[j]) # the coefficients of the splines
}
gamma ~ dnorm(0, tau = 5)
exp.gamma <- exp(gamma)
# the priors are informative to help smoothing
tau.xi <- 1/sigma2.xi
sigma2.xi ~ dgamma(1,0.5)
tau.w <- 1/sigma2.w
sigma2.w ~ dgamma(1,1)
}
)
dat.complete.case <- chicagoNMMAPS[!is.na(chicagoNMMAPS$pm10),]
dat.complete.case <- dat.complete.case[order(dat.complete.case$time),]
dat.complete.case$time <- 1:nrow(dat.complete.case)
dat.complete.case$time_cat <- cut(
dat.complete.case$time,
breaks = seq(from = min(dat.complete.case$time),
to = max(dat.complete.case$time),
length.out = 700), # a weekly term
labels = 1:699,
include.lowest = TRUE
)
dat.complete.case$time_cat <-
as.numeric(droplevels(dat.complete.case$time_cat))
dat.complete.case <- dat.complete.case[order(dat.complete.case$time_cat),]
dat.complete.case$temp_cat <- cut(
dat.complete.case$temp,
breaks = seq(from = min(dat.complete.case$temp),
to = max(dat.complete.case$temp),
length.out = 100),
labels = 1:99,
include.lowest = TRUE
)
dat.complete.case$temp_cat_order <-
as.numeric(droplevels(dat.complete.case$temp_cat))
dat.complete.case <- dat.complete.case[order(dat.complete.case$temp_cat_order),]
RW2 <- function(k){
rest.comp <- list()
for(i in 3:(k-2)){
rest.comp[[i]] <- c(i-2, i-1, i+1, i+2)
}
rest.comp <- unlist(rest.comp)
adj = c(2, 3, 1, 3, 4,
rest.comp,
c(k-3, k-2, k, k-2, k-1)
)
num = c(2, 3, rep(4, times = c(k-4)), 3, 2)
weights = c(c(2, -1, 2, 4, -1),
rep(c(-1, 4, 4, -1), times = c(k-4)),
c(-1, 4, 2, -1, 2))
retlist <- list()
retlist$adj <- adj
retlist$num <- num
retlist$weights <- weights
return(retlist)
}
# temperature
K <- max(dat.complete.case$temp_cat_order)
Wnb <- RW2(K)
# time
N <- max(dat.complete.case$time_cat)
Wnb2 <- RW2(N)
# Format the data for NIMBLE in a list
ChicagoData = list(
O = dat.complete.case$death,
X = as.matrix(fastDummies::dummy_cols(dat.complete.case$dow)[,-c(1:2)]),
PM = scale(dat.complete.case$pm10)[,1]
)
ChicagoConsts <-list(
N = N,
M = length(Wnb2$adj),
K = K,
L = length(Wnb$adj),
adj = Wnb$adj,
num = Wnb$num,
weights = Wnb$weights,
adj2 = Wnb2$adj,
num2 = Wnb2$num,
weights2 = Wnb2$weights,
tmp.ord = dat.complete.case$temp_cat_order,
time_cat = dat.complete.case$time_cat,
Total = nrow(dat.complete.case)
)
inits <- list(
list(alpha=0.01,
sigma2.w=0.1,
sigma2.xi=0.1,
xi=rep(0.1, times = N),
w=rep(0.1, times = K),
beta=rep(0, times = 6),
gamma=0),
list(alpha=1,
sigma2.w=1,
sigma2.xi=1,
xi=rep(0.5, times = N),
w=rep(0.5, times = K),
beta=rep(-1, times = 6),
gamma=2)
)
params <- c("mu", paste0("exp.beta[", 1:6, "]"), "alpha",
"exp.gamma", "xi", "w", "sigma2.xi", "sigma2.w")
# MCMC setting
ni <- 150000 # nb iterations
nt <- 10 # thinning interval
nb <- 50000 # nb iterations as burn-in
nc <- 2 # nb chains
nimbleMCMC()
. (1.7 hours)t_0 <- Sys.time()
CompleteCase.model <- nimbleMCMC(code = CompleteCase.code,
data = ChicagoData,
constants = ChicagoConsts,
inits = inits,
monitors = params,
niter = ni,
nburnin = nb,
thin = nt,
nchains = nc,
setSeed = 9,
progressBar = TRUE,
samplesAsCodaMCMC = TRUE,
summary = TRUE,
WAIC = TRUE
)
t_1 <- Sys.time()
time_lcc <- t_1 - t_0
ggs_mod <- ggs(CompleteCase.model$samples)
ggs_mod %>% filter(Parameter == c("w[10]", "w[9]", "w[24]")) %>%
ggs_traceplot() + theme_bw()
ggs_mod %>% filter(Parameter == c("xi[10]", "xi[90]", "xi[55]")) %>%
ggs_traceplot() + theme_bw()
ggs_mod %>% filter(Parameter == c("sigma2.xi", "sigma2.w")) %>%
ggs_traceplot() + theme_bw()
ggs_mod %>% filter(Parameter == c(paste0("exp.beta[", 1:6, "]"))) %>%
ggs_traceplot() + theme_bw()
ggs_mod %>% filter(Parameter == c("alpha","exp.gamma")) %>%
ggs_traceplot() + theme_bw()
tmp_cat = cut(
dat.complete.case$temp,
breaks = seq(from = min(dat.complete.case$temp),
to = max(dat.complete.case$temp),
length.out = 100),
labels = seq(from = min(dat.complete.case$temp),
to = max(dat.complete.case$temp),
length.out = 100)[-1],
include.lowest = TRUE
)
tmp_cat <- droplevels(tmp_cat)
tmp_cat <-
round(as.numeric(as.character(levels(tmp_cat))), digits = 3)
datRW2 = data.frame(temp = tmp_cat,
median = CompleteCase.model$summary$all.chains[paste0("w[", 1:K, "]"), "Median"],
LL = CompleteCase.model$summary$all.chains[paste0("w[", 1:K, "]"), "95%CI_low"],
UL = CompleteCase.model$summary$all.chains[paste0("w[", 1:K, "]"), "95%CI_upp"])
ggplot() + geom_line(data = datRW2, aes(x = temp, y = exp(median))) +
geom_line(data = datRW2, aes(x = temp, y = exp(LL)), linetype = 2) +
geom_line(data = datRW2, aes(x = temp, y = exp(UL)), linetype = 2) +
theme_bw() + xlab("Temperature") + ylab("Relative risk") +
geom_hline(yintercept = 1, col = "red", linetype = 2) +
ggtitle("Random walk 2")
time_cat = cut(
dat.complete.case$time,
breaks = seq(from = min(dat.complete.case$time),
to = max(dat.complete.case$time),
length.out = 700),
labels = seq(from = min(dat.complete.case$time),
to = max(dat.complete.case$time),
length.out = 700)[-1],
include.lowest = TRUE
)
time_cat <- droplevels(time_cat)
time_cat <-
round(as.numeric(as.character(levels(time_cat))), digits = 3)
datRW2 = data.frame(time_cat = time_cat,
median = CompleteCase.model$summary$all.chains[paste0("xi[", 1:N, "]"), "Median"],
LL = CompleteCase.model$summary$all.chains[paste0("xi[", 1:N, "]"), "95%CI_low"],
UL = CompleteCase.model$summary$all.chains[paste0("xi[", 1:N, "]"), "95%CI_upp"])
ggplot() + geom_line(data = datRW2, aes(x = time_cat, y = exp(median))) +
geom_line(data = datRW2, aes(x = time_cat, y = exp(LL)), linetype = 2) +
geom_line(data = datRW2, aes(x = time_cat, y = exp(UL)), linetype = 2) +
theme_bw() + xlab("Time") + ylab("Relative risk") +
geom_hline(yintercept = 1, col = "red", linetype = 2) +
ggtitle("Random walk 2")
The model that we will use for imputation is: \[ \begin{eqnarray} O_{t} & \sim & \text{Poisson}(\lambda_{t}) \\ \log(\lambda_{t}) & = & \alpha + \sum_j \beta_j\text{dow}_j + \gamma\text{PM}_t + \xi_t + w_t \\ log(\text{PM}_{t}) &\sim& N(\mu_t, \tau_1^{-1}) \\ \mu_t &=& \alpha_1 + u_t\\ \xi_t, w_t, u_t & \sim & RW1(\tau^{-1}) \\ \alpha, {\bf \beta}, \gamma & \sim & N(0, \tau = 5) \\ \alpha_1 &\sim & N(0, \tau = 0.0001)\\ \tau^{-1} & \sim & \text{Gamma}(1,1) \\ \tau_1^{-1} & \sim & \text{Uniform}(0,50) \end{eqnarray} \]
We will investigate PM a bit more and impute the missings
summary(chicagoNMMAPS$pm10)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -3.05 20.77 30.25 33.74 42.42 356.18 251
Here we observe that there are several negatives. These are obviously illegal values and thus I would treat as NAs.
chicagoNMMAPS$pm10[chicagoNMMAPS$pm10<0] <- NA
The idea now is to model the log PM and impute it. This can be done in many different ways. Here we will impute based on its temporal trend. One could also think of using the given covariates to inform imputation.
chicagoNMMAPS$time_cat <- cut(
chicagoNMMAPS$time,
breaks = seq(from = min(chicagoNMMAPS$time),
to = max(chicagoNMMAPS$time),
length.out = 700), # a weekly term
labels = 1:699,
include.lowest = TRUE
)
chicagoNMMAPS$time_cat <-
as.numeric(droplevels(chicagoNMMAPS$time_cat))
chicagoNMMAPS$temp_cat <- cut(
chicagoNMMAPS$temp,
breaks = seq(from = min(chicagoNMMAPS$temp),
to = max(chicagoNMMAPS$temp),
length.out = 100),
labels = 1:99,
include.lowest = TRUE
)
chicagoNMMAPS$temp_cat_order <-
as.numeric(droplevels(chicagoNMMAPS$temp_cat))
chicagoNMMAPS <- chicagoNMMAPS[order(chicagoNMMAPS$temp_cat_order),]
# temperature
K <- max(chicagoNMMAPS$temp_cat_order)
Wnb <- RW2(K)
# time
N <- max(chicagoNMMAPS$time_cat)
Wnb2 <- RW2(N)
Imputation.code <- nimbleCode(
{
for (t in 1:R){ # R is the total number of rows missings included
O[t] ~ dpois(mu[t])
log(mu[t]) <- alpha + inprod(beta[1:6], X[t,1:6]) + gamma*pm.scaled[t] + xi[time_cat[t]] +
w[tmp.ord[t]]
log.PM[t] ~ dnorm(log.pm.mean[t], log.pm.prec)
log.pm.mean[t] <- alpha.pm + u[time_cat[t]]
pm.natscale[t] <- exp(log.PM[t])
}
pm.scaled[1:R] <- (pm.natscale[1:R] - mean(pm.natscale[1:R]))/sd(pm.natscale[1:R])
# intrinsic CAR prior on the effect of time
xi[1:N] ~ dcar_normal(adj2[1:M], weights2[1:M], num2[1:N], tau.xi, zero_mean = 1)
u[1:N] ~ dcar_normal(adj2[1:M], weights2[1:M], num2[1:N], tau.u, zero_mean = 1)
# intrinsic CAR prior on the effect of temperature
w[1:K] ~ dcar_normal(adj[1:L], weights[1:L], num[1:K], tau.w, zero_mean = 1)
# Priors:
# imputation
alpha.pm ~ dnorm(0, 0.0001)
log.pm.prec <- 1/(sd.pm*sd.pm)
sd.pm ~ dunif(0.00001, 50)
tau.u <- 1/sigma2.u
sigma2.u ~ dgamma(1,0.5)
# main model
alpha ~ dflat()
exp.alpha <- exp(alpha)
for(k in 1:6){
beta[k] ~ dnorm(0, tau = 5)
exp.beta[k] <- exp(beta[k])
}
gamma ~ dnorm(0, tau = 5)
exp.gamma <- exp(gamma)
# the priors are informative to help smoothing
tau.xi <- 1/sigma2.xi
sigma2.xi ~ dgamma(1,0.5)
tau.w <- 1/sigma2.w
sigma2.w ~ dgamma(1,0.5)
}
)
chicagoNMMAPS <- chicagoNMMAPS[order(chicagoNMMAPS$pm10),]
# Format the data for NIMBLE in a list
ChicagoData = list(
O = chicagoNMMAPS$death,
X = as.matrix(fastDummies::dummy_cols(chicagoNMMAPS$dow)[,-c(1:2)]),
log.PM = log(chicagoNMMAPS$pm10)
)
ChicagoConsts <-list(
N = N,
M = length(Wnb2$adj),
K = K,
L = length(Wnb$adj),
adj = Wnb$adj,
num = Wnb$num,
weights = Wnb$weights,
adj2 = Wnb2$adj,
num2 = Wnb2$num,
weights2 = Wnb2$weights,
tmp.ord = chicagoNMMAPS$temp_cat_order,
time_cat = chicagoNMMAPS$time_cat,
R = nrow(chicagoNMMAPS),
R1 = sum(!is.na(chicagoNMMAPS$pm10))
)
inits <- list(
list(alpha=0.01,
alpha.pm=log(50),
sd.pm=1,
sigma2.u=0.1,
u=rep(0.1, times = N),
sigma2.w=0.1,
sigma2.xi=0.1,
xi=rep(0.1, times = N),
w=rep(0.1, times = K),
beta=rep(0, times = 6),
gamma=0
),
list(alpha=1,
alpha.pm=log(100),
sd.pm=2,
sigma2.u=1,
u=rep(0.5, times = N),
sigma2.w=1,
sigma2.xi=1,
xi=rep(0.5, times = N),
w=rep(0.5, times = K),
beta=rep(-1, times = 6),
gamma=2
)
)
params <- c("mu", paste0("exp.beta[", 1:6, "]"), "alpha",
"exp.gamma", "xi", "w", "sigma2.xi", "sigma2.w",
"u", "alpha.pm", "sd.pm", "sigma2.u", "pm.natscale")
t_0 <- Sys.time()
Imputation.model <- nimbleMCMC(code = Imputation.code,
data = ChicagoData,
constants = ChicagoConsts,
inits = inits,
monitors = params,
niter = ni,
nburnin = nb,
thin = nt,
nchains = nc,
setSeed = 9,
progressBar = TRUE,
samplesAsCodaMCMC = TRUE,
summary = TRUE,
WAIC = TRUE
)
t_1 <- Sys.time()
time_lim <- t_1 - t_0
Imputation.model_ggs <- ggs(Imputation.model$samples)
Imputation.model_ggs %>% filter(Parameter == c("u[1]", "u[2]", "u[3]")) %>%
ggs_traceplot() + theme_bw()
Imputation.model_ggs %>% filter(Parameter == c("xi[1]", "xi[2]", "xi[3]")) %>%
ggs_traceplot() + theme_bw()
Imputation.model_ggs %>% filter(Parameter == c("w[1]", "w[2]", "w[3]")) %>%
ggs_traceplot() + theme_bw()
Imputation.model_ggs %>% filter(Parameter == c("alpha", "exp.gamma")) %>%
ggs_traceplot() + theme_bw()
Imputation.model_ggs %>% filter(Parameter == c(paste0("exp.beta[", 1:6, "]"))) %>%
ggs_traceplot() + theme_bw()
Imputation.model_ggs %>% filter(Parameter == c("sigma2.xi", "sigma2.w", "sigma2.u")) %>%
ggs_traceplot() + theme_bw()
Imputation.model_ggs %>% filter(Parameter == c("pm.natscale[4859]", "pm.natscale[5000]",
"pm.natscale[5100]")) %>%
ggs_traceplot() + theme_bw()
tmp_cat = cut(
chicagoNMMAPS$temp,
breaks = seq(from = min(chicagoNMMAPS$temp),
to = max(chicagoNMMAPS$temp),
length.out = 100),
labels = seq(from = min(chicagoNMMAPS$temp),
to = max(chicagoNMMAPS$temp),
length.out = 100)[-1],
include.lowest = TRUE
)
tmp_cat <- droplevels(tmp_cat)
tmp_cat <-
round(as.numeric(as.character(levels(tmp_cat))), digits = 3)
datRW2 = data.frame(temp = tmp_cat,
median = Imputation.model$summary$all.chains[paste0("w[", 1:K, "]"), "Median"],
LL = Imputation.model$summary$all.chains[paste0("w[", 1:K, "]"), "95%CI_low"],
UL = Imputation.model$summary$all.chains[paste0("w[", 1:K, "]"), "95%CI_upp"])
ggplot() + geom_line(data = datRW2, aes(x = temp, y = exp(median))) +
geom_line(data = datRW2, aes(x = temp, y = exp(LL)), linetype = 2) +
geom_line(data = datRW2, aes(x = temp, y = exp(UL)), linetype = 2) +
theme_bw() + xlab("Temperature") + ylab("Relative risk") +
geom_hline(yintercept = 1, col = "red", linetype = 2) +
ggtitle("Random walk 2")
time_cat = cut(
chicagoNMMAPS$time,
breaks = seq(from = min(chicagoNMMAPS$time),
to = max(chicagoNMMAPS$time),
length.out = 700),
labels = seq(from = min(chicagoNMMAPS$time),
to = max(chicagoNMMAPS$time),
length.out = 700)[-1],
include.lowest = TRUE
)
time_cat <- droplevels(time_cat)
time_cat <-
round(as.numeric(as.character(levels(time_cat))), digits = 3)
datRW2 = data.frame(time_cat = time_cat,
median = Imputation.model$summary$all.chains[paste0("xi[", 1:N, "]"), "Median"],
LL = Imputation.model$summary$all.chains[paste0("xi[", 1:N, "]"), "95%CI_low"],
UL = Imputation.model$summary$all.chains[paste0("xi[", 1:N, "]"), "95%CI_upp"])
ggplot() + geom_line(data = datRW2, aes(x = time_cat, y = exp(median))) +
geom_line(data = datRW2, aes(x = time_cat, y = exp(LL)), linetype = 2) +
geom_line(data = datRW2, aes(x = time_cat, y = exp(UL)), linetype = 2) +
theme_bw() + xlab("Time") + ylab("Relative risk") +
geom_hline(yintercept = 1, col = "red", linetype = 2) +
ggtitle("Random walk 2")
tab <-
data.frame(
round(
(rbind(
CompleteCase.model$summary$all.chains["exp.gamma", c("Median", "95%CI_low", "95%CI_upp")],
Imputation.model$summary$all.chains["exp.gamma", c("Median", "95%CI_low", "95%CI_upp")]
) - 1)*100, digits = 2)
)
rownames(tab) <- c("Complete cases", "Imputation")
knitr::kable(tab, caption = "Effect of PM") %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "center")
Median | X95.CI_low | X95.CI_upp | |
---|---|---|---|
Complete cases | 0.59 | 0.26 | 0.93 |
Imputation | 0.60 | 0.28 | 0.92 |
In this tutorial, we showed an example on how to fit multiple imputation models with nimble. The effect of the imputation in this particular example is negligible in the precision of the results. That is because less that 5% of the observations are missing. In addition, the computational time increases, when we fitted the joined model selected for the imputation (1.5 days).
de Valpine, P., D. Turek, C. J. Paciorek, C. Anderson-Bergman, D. Temple Lang, and R. Bodik. 2017. “Programming with Models: Writing Statistical Algorithms for General Model Structures with NIMBLE.” Journal of Computational and Graphical Statistics 26: 403–17. https://doi.org/10.1080/10618600.2016.1172487.