In this document we will show how to fit piecewise linear models in a Bayesian framework using NIMBLE
(de Valpine et al. 2017). We will use the Chicago dataset as it can be found in the dlnm
package. Using typing help(chicagoNMMAPS)
you will find the following information about the dataset:
The data set contains daily mortality (all causes, CVD, respiratory), weather (temperature, dew point temperature, relative humidity) and pollution data (PM10 and ozone) for Chicago in the period 1987-2000 from the National Morbidity, Mortality and Air Pollution Study (NMMAPS).
The purpose of the tutorial is to assess the effect of temperature on the number of deaths of all causes, using the Chicago dataset. At this analysis we are not interested in modelling the lag effect.
This practical requires the following packages to be installed and attached: sf
, dplyr
, tidyverse
, nimble
, coda
, spdep
, patchwork
, GGally
, ggmcmc
, kableExtra
, splines2
and dlnm
.
install.packages(c("sf", "dplyr", "tidyverse", "nimble", "coda", "spdep", "patchwork",
"GGally", "ggmcmc", "dlnm", "splines2", "kableExtra"), dependencies = TRUE, repos = "http://cran.r-project.org")
library(sf) # Simple feature for R
library(dplyr) # A package for data manipulation
library(tidyverse) # A package for data manipulation and plotting
library(nimble) # A package for performing MCMC in R
library(coda) # A package for summarizing and plotting of MCMC output
# and diagnostic tests
library(spdep) # A package to calculate neighbors
library(patchwork) # A package to combine plots
library(GGally) # A package to do pairplots
library(ggmcmc) # MCMC diagnostics with ggplot
library(dlnm) # A package for performing dlnms in R
library(splines2) # A package to fit splines in R
library(kableExtra) # packages for formatting tables in rmarkdown
library(fastDummies) # Library to create dummies from a string
library(parallel) # Library for parallel computing
head(chicagoNMMAPS, 10)
## date time year month doy dow death cvd resp temp dptp
## 1 1987-01-01 1 1987 1 1 Thursday 130 65 13 -0.2777778 31.500
## 2 1987-01-02 2 1987 1 2 Friday 150 73 14 0.5555556 29.875
## 3 1987-01-03 3 1987 1 3 Saturday 101 43 11 0.5555556 27.375
## 4 1987-01-04 4 1987 1 4 Sunday 135 72 7 -1.6666667 28.625
## 5 1987-01-05 5 1987 1 5 Monday 126 64 12 0.0000000 28.875
## 6 1987-01-06 6 1987 1 6 Tuesday 130 63 12 4.4444444 35.125
## 7 1987-01-07 7 1987 1 7 Wednesday 129 72 12 1.3888889 26.750
## 8 1987-01-08 8 1987 1 8 Thursday 109 51 13 -1.6666667 22.000
## 9 1987-01-09 9 1987 1 9 Friday 125 62 7 -3.0555556 29.000
## 10 1987-01-10 10 1987 1 10 Saturday 153 90 11 0.2777778 27.750
## rhum pm10 o3
## 1 95.500 26.95607 4.376079
## 2 88.250 NA 4.929803
## 3 89.500 32.83869 3.751079
## 4 84.500 39.95607 4.292746
## 5 74.500 NA 4.751079
## 6 77.375 40.95607 6.334412
## 7 74.500 33.95607 8.594019
## 8 77.875 28.95607 11.797890
## 9 95.125 32.34877 3.876079
## 10 81.875 NA 5.388137
We are fitting a model with a Poisson as the likelihood defined as: \[ \begin{eqnarray} O_{t} & \sim & \text{Poisson}(\lambda_{t}) \\ \log(\lambda_{t}) & = & \alpha + \sum_j \beta_j\text{dow}_{j} + \beta_{\text{temp}1}x_t(x_t - c)I(x_t\leq c) + \beta_{\text{temp2}}(x_t - c)I(x_t>c) + \gamma\text{PM}_t + \xi_t \\ \xi_t & \sim & RW2(\tau^{-1}) \\ \alpha, {\pmb \beta},{\pmb \beta_{\text{temp}}}, \gamma & \sim & N(0, \kappa) \\ 1/\tau & \sim & \text{Gamma}(1,1) \end{eqnarray} \]
where \(\alpha\) is an intercept term, \(\beta_j\) covariates for the effect of the days, \(\gamma\) a linear term for PM and \(\xi_t\) a temporal trend. In this parametrization, \(\alpha - c\beta_{\text{temp}1}\) and \(\beta_{\text{temp}1}\) are the intercept and slope of the first line, whereas \(\alpha - c\beta_{\text{temp}2}\) and \(\beta_{\text{temp}2}\) the intercept and slope of the second line. The challenging part of such analyses is to specify this threshold.
In this tutorial, we will select thresholds in two ways and compare results. The first is by selecting the model with the threshold that minimises the WAIC, whereas the second is to treat the threshold as a random variable and calculate it accordingly.
WAIC.code <- nimbleCode(
{
for (t in 1:Total){
O[t] ~ dpois(mu[t])
log(mu[t]) <- alpha + inprod(beta[1:6], X[t, 1:6]) + gamma*PM[t] + xi[time_cat[t]] +
beta_tmp[J[t]] * (Temperature[t] - x.change)
J[t] <- 1 + step(Temperature[t] - x.change)
}
# intrinsic CAR prior on the effect of time
xi[1:N] ~ dcar_normal(adj[1:M], weights[1:M], num[1:N], tau.xi, c = 2, zero_mean = 1)
# Priors:
alpha ~ dnorm(0, tau = 0.001)
alpha_1 <- alpha - beta_tmp[1]*x.change # the intercept of the first segment
alpha_2 <- alpha - beta_tmp[2]*x.change # the intercept of the second segment
for(j in 1:6){
beta[j] ~ dnorm(0, tau = 5)
}
for (k in 1:2) {
beta_tmp[k] ~ dnorm(0, tau = 5)
}
gamma ~ dnorm(0, tau = 5)
# the priors are informative to help smoothing
tau.xi <- 1/sigma2.xi
sigma2.xi ~ dgamma(1, 0.5)
}
)
dat.complete.case <- chicagoNMMAPS[!is.na(chicagoNMMAPS$pm10),]
dat.complete.case <- dat.complete.case[order(dat.complete.case$time),]
dat.complete.case$time <- 1:nrow(dat.complete.case)
dat.complete.case$time_cat <- cut(
dat.complete.case$time,
breaks = seq(from = min(dat.complete.case$time),
to = max(dat.complete.case$time),
length.out = 700), # a weekly term
labels = 1:699,
include.lowest = TRUE
)
dat.complete.case$time_cat <-
as.numeric(droplevels(dat.complete.case$time_cat))
dat.complete.case <- dat.complete.case[order(dat.complete.case$time_cat),]
RW2 <- function(k){
rest.comp <- list()
for(i in 3:(k-2)){
rest.comp[[i]] <- c(i-2, i-1, i+1, i+2)
}
rest.comp <- unlist(rest.comp)
adj = c(2, 3, 1, 3, 4,
rest.comp,
c(k-3, k-2, k, k-2, k-1)
)
num = c(2, 3, rep(4, times = c(k-4)), 3, 2)
weights = c(c(2, -1, 2, 4, -1),
rep(c(-1, 4, 4, -1), times = c(k-4)),
c(-1, 4, 2, -1, 2))
retlist <- list()
retlist$adj <- adj
retlist$num <- num
retlist$weights <- weights
return(retlist)
}
# time
N <- max(dat.complete.case$time_cat)
Wnb <- RW2(N)
# Format the data for NIMBLE in a list
ChicagoData = list(
O = dat.complete.case$death,
X = as.matrix(fastDummies::dummy_cols(dat.complete.case$dow)[,-c(1:2)]),
PM = scale(dat.complete.case$pm10)[,1],
Temperature = dat.complete.case$temp
)
ChicagoConsts <-list(
N = N,
M = length(Wnb$adj),
adj = Wnb$adj,
num = Wnb$num,
weights = Wnb$weights,
time_cat = dat.complete.case$time_cat,
Total = nrow(dat.complete.case)
)
inits <-
list(alpha=rep(0.01, times = 1),
sigma2.xi=0.1,
xi=rep(0.1, times = N),
beta=rep(0, times = 6),
gamma=0,
beta_tmp=rep(0, times = 2))
params <- c("mu", paste0("beta[", 1:6, "]"), "alpha", "alpha_1", "alpha_2",
"gamma", "xi", "sigma2.xi", paste0("beta_tmp[", 1:2, "]"))
# MCMC setting
ni <- 150000 # nb iterations
nt <- 10 # thinning interval
nb <- 50000 # nb iterations as burn-in
nc <- 1 # nb chains
nimbleMCMC()
. If implemented on the parallel environment takes ~1h.thresholds <- quantile(dat.complete.case$temp, probs = seq(from = 0.05, to = 0.95, by = 0.05))
parfun <- function(m){
ChicagoConsts$x.change <- as.numeric(thresholds[m])
WAIC.model <- nimbleMCMC(code = WAIC.code,
data = ChicagoData,
constants = ChicagoConsts,
inits = inits,
monitors = params,
niter = ni,
nburnin = nb,
thin = nt,
nchains = nc,
setSeed = 9,
progressBar = TRUE,
samplesAsCodaMCMC = TRUE,
summary = TRUE,
WAIC = TRUE
)
WAIC_vals <- WAIC.model$WAIC
beta_1 <- WAIC.model$summary["beta_tmp[1]", c("Median", "95%CI_low", "95%CI_upp")]
beta_2 <- WAIC.model$summary["beta_tmp[2]", c("Median", "95%CI_low", "95%CI_upp")]
list2ret <- list(WAIC_vals = WAIC_vals, beta_1 = beta_1, beta_2 = beta_2)
return(list2ret)
}
# Set up parallel environment
m <- 1:19
ncores <- 19
cl <- makeCluster(ncores, methods=FALSE)
clusterEvalQ(cl, {
library(nimble)
})
clusterExport(cl, c("parfun", "thresholds", "WAIC.code", "ChicagoData", "ChicagoConsts",
"inits", "params", "ni", "nb", "nc", "nt", "m"))
# For 1 parameter use parSapply
t_0 <- Sys.time()
outpar <- parLapply(cl = cl, m, parfun)
t_1 <- Sys.time()
t_1 - t_0
# close parallel environment
stopCluster(cl)
dat_WAIC <- data.frame(quant = seq(from = 0.05, to = 0.95, by = 0.05),
thresholds = thresholds,
WAIC = sapply(outpar, function(X) X$WAIC_vals))
dat_WAIC %>% slice(which.min(WAIC)) -> minWAIC
ggplot() + geom_point(data = dat_WAIC, aes(x=thresholds, y = WAIC)) +
geom_line(data = dat_WAIC, aes(x=thresholds, y = WAIC)) + ylim(c(38000, 38200)) + xlim(c(-10, 30)) +
theme_bw() + xlab("Temperature") + geom_vline(data = minWAIC, aes(xintercept = thresholds), linetype = "dashed") +
annotate("text", x = 24, y = 38060, label = paste0("Quantile = ", minWAIC$quant)) +
annotate("text", x = 25, y = 38040, label = paste0("Temperature = ", round(minWAIC$thresholds), "oC"))
# beta_1
dat_beta1 <- data.frame(quant = seq(from = 0.05, to = 0.95, by = 0.05),
thresholds = thresholds)
dat_beta1 <- cbind(dat_beta1, do.call(rbind, lapply(outpar, function(X) X$beta_1)))
colnames(dat_beta1)[c(4:5)] <- c("low", "up")
ggplot() +
geom_point(data = dat_beta1, aes(x = thresholds, y = Median)) +
geom_errorbar(data = dat_beta1, aes(x = thresholds, y = Median, ymin = low, ymax = up),
width = .5, position = "dodge") + ylim(c(-0.004, 0.004)) + xlim(c(-10, 30)) +
xlab("Temperature") + ylab("log beta_1") + theme_bw() +
geom_vline(data = minWAIC, aes(xintercept = thresholds), linetype = "dashed", col = "red") -> p1
# beta_2
dat_beta2 <- data.frame(quant = seq(from = 0.05, to = 0.95, by = 0.05),
thresholds = thresholds)
dat_beta2 <- cbind(dat_beta2, do.call(rbind, lapply(outpar, function(X) X$beta_2)))
colnames(dat_beta2)[c(4:5)] <- c("low", "up")
ggplot() +
geom_point(data = dat_beta2, aes(x = thresholds, y = Median)) +
geom_errorbar(data = dat_beta2, aes(x = thresholds, y = Median, ymin = low, ymax = up),
width = .5, position = "dodge") + ylim(c(0, 0.035)) + xlim(c(-10, 30)) +
xlab("Temperature") + ylab("log beta_2") + theme_bw() +
geom_vline(data = minWAIC, aes(xintercept = thresholds), linetype = "dashed", col = "red") -> p2
p1|p2
options(scipen = 9999999)
dat <- cbind(round(dat_WAIC$WAIC),
signif(cbind(dat_WAIC[,-3], dat_beta1[,-c(1,2)], dat_beta2[,-c(1,2)]), 2))
colnames(dat)[1] <- "WAIC"
dat <- rbind(
c("", "", "", "", "beta_1", "", "", "beta_2", ""),
colnames(dat),
dat
)
colnames(dat) <- NULL
knitr::kable(dat) %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "center")
1 | beta_1 | beta_2 | |||||||
2 | WAIC | quant | thresholds | Median | low | up | Median | low | up |
5% | 38131 | 0.05 | -7.8 | -0.00049 | -0.0029 | 0.0018 | 0.0019 | 0.0012 | 0.0025 |
10% | 38134 | 0.1 | -3.9 | -0.00051 | -0.002 | 0.001 | 0.0022 | 0.0015 | 0.003 |
15% | 38122 | 0.15 | -1.7 | -0.00067 | -0.0019 | 0.00061 | 0.0026 | 0.0018 | 0.0034 |
20% | 38115 | 0.2 | 0.28 | -0.00079 | -0.0019 | 0.00032 | 0.003 | 0.0022 | 0.0039 |
25% | 38102 | 0.25 | 1.7 | -0.00079 | -0.0018 | 0.00024 | 0.0033 | 0.0024 | 0.0042 |
30% | 38094 | 0.3 | 3.3 | -0.00077 | -0.0017 | 0.00017 | 0.0038 | 0.003 | 0.0046 |
35% | 38087 | 0.35 | 5 | -0.00069 | -0.0015 | 0.00019 | 0.0042 | 0.0033 | 0.0052 |
40% | 38083 | 0.4 | 6.7 | -0.00058 | -0.0014 | 0.00028 | 0.0046 | 0.0036 | 0.0056 |
45% | 38070 | 0.45 | 8.6 | -0.00037 | -0.0012 | 0.00039 | 0.0051 | 0.004 | 0.0062 |
50% | 38084 | 0.5 | 10 | -0.00023 | -0.00099 | 0.00051 | 0.0053 | 0.0041 | 0.0064 |
55% | 38080 | 0.55 | 12 | -0.000012 | -0.00077 | 0.00073 | 0.0058 | 0.0046 | 0.0072 |
60% | 38070 | 0.6 | 14 | 0.00021 | -0.00055 | 0.00095 | 0.0066 | 0.0051 | 0.0079 |
65% | 38068 | 0.65 | 16 | 0.0003 | -0.00043 | 0.00097 | 0.007 | 0.0055 | 0.0085 |
70% | 38074 | 0.7 | 18 | 0.00055 | -0.00013 | 0.0013 | 0.0078 | 0.0062 | 0.0095 |
75% | 38088 | 0.75 | 19 | 0.00069 | -0.0000023 | 0.0013 | 0.0084 | 0.0066 | 0.01 |
80% | 38089 | 0.8 | 21 | 0.00075 | 0.000098 | 0.0014 | 0.0097 | 0.0074 | 0.012 |
85% | 38086 | 0.85 | 22 | 0.00085 | 0.00023 | 0.0015 | 0.012 | 0.0092 | 0.015 |
90% | 38087 | 0.9 | 24 | 0.00096 | 0.00033 | 0.0016 | 0.016 | 0.012 | 0.02 |
95% | 38095 | 0.95 | 26 | 0.001 | 0.00046 | 0.0017 | 0.026 | 0.021 | 0.032 |
inits <- list(
list(alpha=0.01,
sigma2.xi=0.1,
xi=rep(0.1, times = N),
beta=rep(0, times = 6),
gamma=0,
beta_tmp=rep(0, times = 2)),
list(alpha=1,
sigma2.xi=1,
xi=rep(0.5, times = N),
beta=rep(-1, times = 6),
gamma=2,
beta_tmp=rep(1, times = 2))
)
nc <- 2
ChicagoConsts$x.change <- minWAIC$thresholds
WAIC.model <- nimbleMCMC(code = WAIC.code,
data = ChicagoData,
constants = ChicagoConsts,
inits = inits,
monitors = params,
niter = ni,
nburnin = nb,
thin = nt,
nchains = nc,
setSeed = 9,
progressBar = TRUE,
samplesAsCodaMCMC = TRUE,
summary = TRUE,
WAIC = TRUE
)
ggs_mod_WAIC <- ggs(WAIC.model$samples)
ggs_mod_WAIC %>% filter(Parameter %in% c("alpha_1", "alpha_2")) %>%
ggs_traceplot() + theme_bw()
ggs_mod_WAIC %>% filter(Parameter %in% c(paste0("beta_tmp[", 1:2, "]"))) %>%
ggs_traceplot() + theme_bw()
Changepoint.code <- nimbleCode(
{
for (t in 1:Total){
O[t] ~ dpois(mu[t])
log(mu[t]) <- alpha + inprod(beta[1:6], X[t, 1:6]) + gamma*PM[t] + xi[time_cat[t]] +
beta_tmp[J[t]] * (Temperature[t] - x.change)
J[t] <- 1 + step(Temperature[t] - x.change)
}
# intrinsic CAR prior on the effect of time
xi[1:N] ~ dcar_normal(adj[1:M], weights[1:M], num[1:N], tau.xi, c = 2, zero_mean = 1)
# Priors:
alpha ~ dnorm(0, tau = 0.001)
alpha_1 <- alpha - beta_tmp[1]*x.change # the intercept of the first segment
alpha_2 <- alpha - beta_tmp[2]*x.change # the intercept of the second segment
for(j in 1:6){
beta[j] ~ dnorm(0, tau = 5)
}
for (k in 1:2) {
beta_tmp[k] ~ dnorm(0, tau = 5)
}
gamma ~ dnorm(0, tau = 5)
# the priors are informative to help smoothing
tau.xi <- 1/sigma2.xi
sigma2.xi ~ dgamma(1, 0.5)
x.change ~ dunif(-26,33) # tuned based on the max and min
}
)
inits <- list(
list(alpha=0.01,
sigma2.xi=0.1,
xi=rep(0.1, times = N),
beta=rep(0, times = 6),
gamma=0,
x.change=10,
beta_tmp=rep(0, times = 2)),
list(alpha=1,
sigma2.xi=1,
xi=rep(0.5, times = N),
beta=rep(-1, times = 6),
gamma=2,
x.change=20,
beta_tmp=rep(1, times = 2))
)
params <- c("mu", paste0("beta[", 1:6, "]"), "alpha", "alpha_1","alpha_2",
"gamma", "xi", "sigma2.xi", paste0("beta_tmp[", 1:2, "]"),
"x.change")
# MCMC setting
ni <- 150000 # nb iterations
nt <- 10 # thinning interval
nb <- 50000 # nb iterations as burn-in
nc <- 2 # nb chains
nimbleMCMC()
. (1.7 hours)ChicagoConsts$x.change <- NULL
t_0 <- Sys.time()
Changepoint.model <- nimbleMCMC(code = Changepoint.code,
data = ChicagoData,
constants = ChicagoConsts,
inits = inits,
monitors = params,
niter = ni,
nburnin = nb,
thin = nt,
nchains = nc,
setSeed = 9,
progressBar = TRUE,
samplesAsCodaMCMC = TRUE,
summary = TRUE,
WAIC = TRUE
)
t_1 <- Sys.time()
time_cp <- t_1 - t_0
ggs_mod_cp <- ggs(Changepoint.model$samples)
ggs_mod_cp %>% filter(Parameter %in% c( "alpha_1", "alpha_2")) %>%
ggs_traceplot() + theme_bw()
ggs_mod_cp %>% filter(Parameter %in% c(paste0("beta_tmp[", 1:2, "]"))) %>%
ggs_traceplot() + theme_bw()
ggs_mod_cp %>% filter(Parameter %in% c("x.change")) %>%
ggs_traceplot() + theme_bw()
The model with threshold as a random variable picks up a higher threshold (see histogram below) compared to the one picked up by the WAIC procedure (red dashed line below). I should also note that I rerun the analysis for the changepoint model with different initial values for the x.change, namely 0 and -10, and the results are identical.
ggs_mod_cp %>% filter(Parameter %in% c("x.change")) %>%
ggs_histogram() + theme_bw() + geom_vline(data = minWAIC, aes(xintercept = thresholds),
linetype = "dashed", col = "red") + xlim(c(0,30))
Coef95CrI <- function(mod){
a_1 <- signif(quantile(mod$samples$chain2[,"alpha_1"], probs = c(0.5, 0.025, 0.975)), digits = 4)
a_2 <- signif(quantile(mod$samples$chain2[,"alpha_2"], probs = c(0.5, 0.025, 0.975)), digits = 4)
a.bind <-
rbind(
cbind(
paste0(a_1[1]),
paste0("(", a_1[2], ", " , a_1[3], ")")
),
cbind(
paste0(a_2[1]),
paste0("(", a_2[2], ", " , a_2[3], ")")
)
)
b_1 <- signif(quantile(mod$samples$chain2[,"beta_tmp[1]"],
probs = c(0.5, 0.025, 0.975))*100, digits = 2)
b_2 <- signif(quantile(mod$samples$chain2[,"beta_tmp[2]"],
probs = c(0.5, 0.025, 0.975))*100, digits = 2)
b.bind <-
rbind(
cbind(
paste0(b_1[1], "%"),
paste0("(", b_1[2], "%", ", " , b_1[3], "%)")
),
cbind(
paste0(b_2[1], "%"),
paste0("(", b_2[2], "%", ", " , b_1[3], "%)")
)
)
dat <- data.frame(rbind(a.bind, b.bind))
rownames(dat) <- c("alpha_1", "alpha_2", "beta_1*100", "beta_2*100")
colnames(dat) <- c("Median", "CrI95%")
return(dat)
}
tab <- cbind(Coef95CrI(WAIC.model), Coef95CrI(Changepoint.model))
tab <- rbind(
c("WAIC", "", "Changepoint", ""),
colnames(tab),
tab
)
colnames(tab) <- NULL
rownames(tab)[c(1,2)] <- c("", " ")
knitr::kable(tab) %>%
kable_styling(bootstrap_options = "striped",
full_width = F, position = "center")
WAIC | Changepoint | |||
Median | CrI95% | Median | CrI95% | |
alpha_1 | 4.712 | (4.703, 4.722) | 4.717 | (4.708, 4.727) |
alpha_2 | 4.603 | (4.572, 4.632) | 4.044 | (3.555, 4.312) |
beta_1*100 | 0.036% | (-0.03%, 0.11%) | 0.11% | (0.052%, 0.17%) |
beta_2*100 | 0.72% | (0.57%, 0.11%) | 2.7% | (1.7%, 0.17%) |
tmp_cat = cut(
dat.complete.case$temp,
breaks = seq(from = min(dat.complete.case$temp),
to = max(dat.complete.case$temp),
length.out = 100),
labels = seq(from = min(dat.complete.case$temp),
to = max(dat.complete.case$temp),
length.out = 100)[-1],
include.lowest = TRUE
)
tmp_cat <- droplevels(tmp_cat)
tmp_cat <-
round(as.numeric(as.character(levels(tmp_cat))), digits = 3)
K <- length(tmp_cat)
datRW2 = data.frame(temp = tmp_cat,
median = RW2.model$summary$all.chains[paste0("w[", 1:K, "]"), "Median"] +
RW2.model$summary$all.chains["alpha","Median"],
LL = RW2.model$summary$all.chains[paste0("w[", 1:K, "]"), "95%CI_low"] +
RW2.model$summary$all.chains["alpha","Median"],
UL = RW2.model$summary$all.chains[paste0("w[", 1:K, "]"), "95%CI_upp"] +
RW2.model$summary$all.chains["alpha","Median"])
# function to get the lines to plot them
PlotLinearSegm <- function(mod, thr){
a <- median(mod$samples$chain2[,"alpha"])
b_1 <- median(mod$samples$chain2[,"beta_tmp[1]"])
b_2 <- median(mod$samples$chain2[,"beta_tmp[2]"])
a_1 <- median(mod$samples$chain2[,"alpha_1"])
a_2 <- median(mod$samples$chain2[,"alpha_2"])
x_1 <- seq(from=min(tmp_cat), to=thr, length.out = 20)
x_2 <- seq(from=thr, to=max(tmp_cat), length.out = 20)
return(data.frame(xvalues = c(x_1, x_2), beta = c(c(a_1 + b_1*x_1), c(a_2 + b_2*x_2))))
}
# WAIC model
dat.WAIC.plot <- PlotLinearSegm(mod=WAIC.model, thr=minWAIC$thresholds)
dat.WAIC.plot$model <- "WAIC"
# CP model
dat.cp.plot <- PlotLinearSegm(mod=Changepoint.model,
thr=median(Changepoint.model$samples$chain2[,paste0("x.change")]))
dat.cp.plot$model <- "CP"
dat.mod.plot <- rbind(dat.WAIC.plot, dat.cp.plot)
ggplot() + geom_line(data = datRW2, aes(x = temp, y = median), size = 1) +
geom_line(data = datRW2, aes(x = temp, y = LL), linetype = 2) +
geom_line(data = datRW2, aes(x = temp, y = UL), linetype = 2) +
theme_bw() + xlab("Temperature") + ylab("log Number of deaths") + xlim(c(-30, 35)) +
geom_line(data = dat.mod.plot, aes(x = xvalues, y = beta, col = model), linetype = 1, size = 1)
In this tutorial, I show how to fit linear threshold models using NIMBLE. The two approaches produce similar results, nevertheless using a random variable for the threshold is a more flexible and natural approach. The Bayesian framework naturally provides a scheme to propagate any prior knowledge about this threshold, for instance when we are interested in higher temperatures. Of notice is that the parametrisation used influences convergence and the results. We used a RW2 based approach as the truth and we showed that even in such extreme cases a linear threshold model with one threshold would suffice. Of course, splines or RW2 are more flexible, but certain reasons such as: a) when the interest lies in higher temperatures, b) more intuitive communication of the results (say by 1oC increase in temperature) and c) when such a threshold is of interest, call for the use of these models. I should also note that it is common for such analyses to implement spatially varying coefficients which could account for steep increases in the higher quantiles, implying that a line would be enough. Finally, the flexibility of such models can be increased by adding more thresholds, but if you aim at flexibility splines and RW2 are better alternatives.
de Valpine, P., D. Turek, C. J. Paciorek, C. Anderson-Bergman, D. Temple Lang, and R. Bodik. 2017. “Programming with Models: Writing Statistical Algorithms for General Model Structures with NIMBLE.” Journal of Computational and Graphical Statistics 26: 403–17. https://doi.org/10.1080/10618600.2016.1172487.