In this document we will show how to use splines and random walks for flexible fits using NIMBLE
(de Valpine et al. 2017). We will use the Chicago dataset as it can be found in the dlnm
package. Using typing help(chicagoNMMAPS)
you will find the following information about the dataset:
The data set contains daily mortality (all causes, CVD, respiratory), weather (temperature, dew point temperature, relative humidity) and pollution data (PM10 and ozone) for Chicago in the period 1987-2000 from the National Morbidity, Mortality and Air Pollution Study (NMMAPS).
The purpose of the tutorial is to assess the effect of temperature on the number of deaths of all causes, using the Chicago dataset. We will start fitting a linear effect and then to allow more flexible fits, we will consider splines, RW1 and RW2.
This practical requires the following packages to be installed and attached: sf
, dplyr
, tidyverse
, nimble
, coda
, spdep
, patchwork
, GGally
, ggmcmc
, kableExtra
, splines2
and dlnm
.
install.packages(c("sf", "dplyr", "tidyverse", "nimble", "coda", "spdep", "patchwork",
"GGally", "ggmcmc", "dlnm", "splines2", "kableExtra"), dependencies = TRUE, repos = "http://cran.r-project.org")
library(sf) # Simple feature for R
library(dplyr) # A package for data manipulation
library(tidyverse) # A package for data manipulation and plotting
library(nimble) # A package for performing MCMC in R
library(coda) # A package for summarizing and plotting of MCMC output
# and diagnostic tests
library(spdep) # A package to calculate neighbors
library(patchwork) # A package to combine plots
library(GGally) # A package to do pairplots
library(ggmcmc) # MCMC diagnostics with ggplot
library(dlnm) # A package for performing dlnms in R
library(splines2) # A package to fit splines in R
library(kableExtra) # packages for formatting tables in rmarkdown
head(chicagoNMMAPS, 10)
## date time year month doy dow death cvd resp temp dptp
## 1 1987-01-01 1 1987 1 1 Thursday 130 65 13 -0.2777778 31.500
## 2 1987-01-02 2 1987 1 2 Friday 150 73 14 0.5555556 29.875
## 3 1987-01-03 3 1987 1 3 Saturday 101 43 11 0.5555556 27.375
## 4 1987-01-04 4 1987 1 4 Sunday 135 72 7 -1.6666667 28.625
## 5 1987-01-05 5 1987 1 5 Monday 126 64 12 0.0000000 28.875
## 6 1987-01-06 6 1987 1 6 Tuesday 130 63 12 4.4444444 35.125
## 7 1987-01-07 7 1987 1 7 Wednesday 129 72 12 1.3888889 26.750
## 8 1987-01-08 8 1987 1 8 Thursday 109 51 13 -1.6666667 22.000
## 9 1987-01-09 9 1987 1 9 Friday 125 62 7 -3.0555556 29.000
## 10 1987-01-10 10 1987 1 10 Saturday 153 90 11 0.2777778 27.750
## rhum pm10 o3
## 1 95.500 26.95607 4.376079
## 2 88.250 NA 4.929803
## 3 89.500 32.83869 3.751079
## 4 84.500 39.95607 4.292746
## 5 74.500 NA 4.751079
## 6 77.375 40.95607 6.334412
## 7 74.500 33.95607 8.594019
## 8 77.875 28.95607 11.797890
## 9 95.125 32.34877 3.876079
## 10 81.875 NA 5.388137
Please notice that there are also missing values.
summary(chicagoNMMAPS)
## date time year month
## Min. :1987-01-01 Min. : 1 Min. :1987 Min. : 1.000
## 1st Qu.:1990-07-02 1st Qu.:1279 1st Qu.:1990 1st Qu.: 4.000
## Median :1993-12-31 Median :2558 Median :1994 Median : 7.000
## Mean :1993-12-31 Mean :2558 Mean :1994 Mean : 6.522
## 3rd Qu.:1997-07-01 3rd Qu.:3836 3rd Qu.:1997 3rd Qu.:10.000
## Max. :2000-12-31 Max. :5114 Max. :2000 Max. :12.000
##
## doy dow death cvd
## Min. : 1.0 Sunday :731 Min. : 69.0 Min. : 25.0
## 1st Qu.: 92.0 Monday :730 1st Qu.:105.0 1st Qu.: 44.0
## Median :183.0 Tuesday :730 Median :114.0 Median : 50.0
## Mean :183.1 Wednesday:730 Mean :115.4 Mean : 50.9
## 3rd Qu.:274.0 Thursday :731 3rd Qu.:124.0 3rd Qu.: 57.0
## Max. :366.0 Friday :731 Max. :411.0 Max. :312.0
## Saturday :731
## resp temp dptp rhum
## Min. : 0.000 Min. :-26.667 Min. :-25.62 Min. : 25.62
## 1st Qu.: 7.000 1st Qu.: 1.667 1st Qu.: 27.25 1st Qu.: 58.12
## Median : 9.000 Median : 10.556 Median : 39.75 Median : 69.75
## Mean : 9.178 Mean : 10.107 Mean : 40.41 Mean : 69.17
## 3rd Qu.:11.000 3rd Qu.: 19.444 3rd Qu.: 55.75 3rd Qu.: 80.38
## Max. :34.000 Max. : 33.333 Max. : 78.25 Max. :100.00
## NA's :1096
## pm10 o3
## Min. : -3.05 Min. :-0.832
## 1st Qu.: 20.77 1st Qu.:11.960
## Median : 30.25 Median :19.045
## Mean : 33.74 Mean :20.143
## 3rd Qu.: 42.42 3rd Qu.:26.624
## Max. :356.18 Max. :65.808
## NA's :251
There are 1096 missing values for the mean daily relative humidity and 251 for PM\(_10\).
ggplot() + geom_point(data = chicagoNMMAPS, aes(x = time, y = rhum), size = .5) + theme_bw() +
theme(text = element_text(size = 10)) + ggtitle("Relative humidity") +
ylab("") -> p1
ggplot() + geom_point(data = chicagoNMMAPS, aes(x = time, y = pm10), size = .5) + theme_bw() +
theme(text = element_text(size = 10)) + ggtitle("PM10") +
ylab("") -> p2
ggplot() + geom_point(data = chicagoNMMAPS, aes(x = time, y = temp), size = .5) + theme_bw() +
theme(text = element_text(size = 10)) + ggtitle("Temperature") +
ylab("") -> p3
ggplot() + geom_point(data = chicagoNMMAPS, aes(x = time, y = o3), size = .5) + theme_bw() +
theme(text = element_text(size = 10)) + ggtitle("O3") +
ylab("") -> p4
(p1|p2)/(p3|p4)
There is a clear seasonality for all of the variables assessed.
At this section, we will examine the effect of temperature on all cause mortality.
LinearCode <- nimbleCode(
{
for (i in 1:N){
O[i] ~ dpois(mu[i]) # Poisson for observed counts
log(mu[i]) <- alpha + b*temperature[i]
}
# Priors:
alpha ~ dflat() # Unif(-inf, +inf)
exp.alpha <- exp(alpha) # overall counts across time period
b ~ dnorm(0, tau = 5)
b.exp <- exp(b)
}
)
N <- dim(chicagoNMMAPS)[1]
# Format the data for NIMBLE in a list
ChicagoData = list(
O = chicagoNMMAPS$death, # observed nb of deaths
temperature = chicagoNMMAPS$temp # daily temperature
)
ChicagoConsts <-list(
N = N # nb of time points
)
# Initialise the unknown parameters, 2 chains
inits <- list(
list(alpha=0, b=0), # chain 1
list(alpha=-2, b=1)) # chain 2
params <- c("b.exp", "exp.alpha", "mu", "alpha", "b")
Note that the parameters that are not set, will NOT be monitored!
# MCMC setting
ni <- 50000 # nb iterations
nt <- 10 # thinning interval
nb <- 10000 # nb iterations as burn-in
nc <- 2 # nb chains
The burn-in should be long enough to discard the initial part of the Markov chains that have not yet converged to the stationary distribution.
nimbleMCMC()
t_0 <- Sys.time()
LinearModel <- nimbleMCMC(code = LinearCode,
data = ChicagoData,
constants = ChicagoConsts,
inits = inits,
monitors = params,
niter = ni,
nburnin = nb,
thin = nt,
nchains = nc,
setSeed = 9,
progressBar = FALSE,
samplesAsCodaMCMC = TRUE,
summary = TRUE,
WAIC = TRUE
)
t_1 <- Sys.time()
time_linear <- t_1 - t_0
linear_ggs <- ggs(LinearModel$samples)
linear_ggs %>% filter(Parameter == c("alpha", "b")) %>%
ggs_traceplot() + theme_bw()
tab <-
round(LinearModel$summary$all.chains[c("exp.alpha", "b.exp"),],
digits = 3)
knitr::kable(tab, caption = "Median and 95%CrI for the covariate coefficients of the linear model.") %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "center")
Mean | Median | St.Dev. | 95%CI_low | 95%CI_upp | |
---|---|---|---|---|---|
exp.alpha | 120.636 | 120.637 | 0.209 | 120.231 | 121.039 |
b.exp | 0.996 | 0.996 | 0.000 | 0.995 | 0.996 |
# define the knots
knots <- c(0, 10, 20)
# and the matrix
bsMat <- bSpline(chicagoNMMAPS$temp,
knots = knots,
degree = 3,
intercept = TRUE)
SplinesCode <- nimbleCode(
{
for (i in 1:N){
O[i] ~ dpois(mu[i]) # Poisson for observed counts
log(mu[i]) <- inprod(beta[], X[i,])
}
# Priors:
# alpha ~ dflat() # Unif(-inf, +inf)
# exp.alpha <- exp(alpha) # overall counts across time period
for(j in 1:7){
beta[j] ~ dnorm(0, tau = 5)
exp.beta[j] <- exp(beta[j]) # the coefficients of the splines
}
}
)
N <- dim(chicagoNMMAPS)[1]
# Format the data for NIMBLE in a list
ChicagoData = list(
O = chicagoNMMAPS$death, # observed nb of deaths
X = bsMat # the functional form of temperature
)
ChicagoConsts <-list(
N = N # nb of time points
)
# Initialise the unknown parameters, 2 chains
inits <- list(
list(beta=rep(0, 7)), # chain 1
list(beta=rep(1, 7))# chain 2
)
params <- c("mu", paste0("beta[", 1:7, "]"),
paste0("exp.beta[", 1:7, "]"))
# MCMC setting
ni <- 50000 # nb iterations
nt <- 10 # thinning interval
nb <- 10000 # nb iterations as burn-in
nc <- 2 # nb chains
nimbleMCMC()
t_0 <- Sys.time()
SplinesModel <- nimbleMCMC(code = SplinesCode,
data = ChicagoData,
constants = ChicagoConsts,
inits = inits,
monitors = params,
niter = ni,
nburnin = nb,
thin = nt,
nchains = nc,
setSeed = 9,
progressBar = FALSE,
samplesAsCodaMCMC = TRUE,
summary = TRUE,
WAIC = TRUE
)
t_1 <- Sys.time()
time_splines <- t_1 - t_0
splines_ggs <- ggs(SplinesModel$samples)
splines_ggs %>% filter(Parameter == c(paste0("beta[", 1:7, "]"))) %>%
ggs_traceplot() + theme_bw()
round(
SplinesModel$summary$all.chains[paste0("exp.beta[", 1:7, "]"),],
digits = 2
)
## Mean Median St.Dev. 95%CI_low 95%CI_upp
## exp.beta[1] 131.76 131.70 5.44 121.44 142.56
## exp.beta[2] 123.55 123.54 2.66 118.37 128.78
## exp.beta[3] 131.61 131.61 1.39 128.95 134.36
## exp.beta[4] 112.20 112.20 0.66 110.95 113.52
## exp.beta[5] 107.66 107.65 0.94 105.82 109.52
## exp.beta[6] 106.49 106.50 1.41 103.76 109.32
## exp.beta[7] 149.97 149.95 4.39 141.57 158.68
# try to plot it
chicagoNMMAPS$tmp_spline <-
bsMat %*% SplinesModel$summary$all.chains[c(paste0("beta[", 1:7, "]")), "Median"]
# spline
chicagoNMMAPS$log.countLL <- bsMat %*% SplinesModel$summary$all.chains[c(paste0("beta[", 1:7, "]")), "95%CI_low"]
chicagoNMMAPS$log.countUL <- bsMat %*% SplinesModel$summary$all.chains[c(paste0("beta[", 1:7, "]")), "95%CI_upp"]
dat_spline = data.frame(temp = chicagoNMMAPS$temp,
median = chicagoNMMAPS$tmp_spline,
LL = chicagoNMMAPS$log.countLL,
UL = chicagoNMMAPS$log.countUL)
ggplot() + geom_line(data = dat_spline, aes(x = temp, y = exp(median))) +
geom_line(data = dat_spline, aes(x = temp, y = exp(LL)), linetype = 2) +
geom_line(data = dat_spline, aes(x = temp, y = exp(UL)), linetype = 2) +
theme_bw() + xlab("Temperature") + ylab("Number of cases") +
ggtitle("Splines")
Now I will allow a more flexible fit using the random walk process. The idea is to create an ordered variable by categorizing the continuous variable of interest (i.e. temperature). The ordered variable is created because based on this, we will get the neighboring structure required for the RW1 process. We can sample using dcar_normal()
and weights in a similar fashion as the geographical weights in previous tutorials.
chicagoNMMAPS$temp_cat <- cut(
chicagoNMMAPS$temp,
breaks = seq(from = min(chicagoNMMAPS$temp),
to = max(chicagoNMMAPS$temp),
length.out = 200),
labels = 1:199,
include.lowest = TRUE
)
chicagoNMMAPS$temp_cat_order <-
as.numeric(droplevels(chicagoNMMAPS$temp_cat))
chicagoNMMAPS <- chicagoNMMAPS[order(chicagoNMMAPS$temp_cat_order),]
K <- max(chicagoNMMAPS$temp_cat_order)
W <- matrix(0, nrow = K, ncol = K)
for(i in 1:(K-1)) W[i,i+1] <- 1
for(i in 1:(K-1)) W[i+1,i] <- 1
Wnb <- mat2listw(W)
Wnb <- nb2WB(nb = Wnb$neighbours)
RW1_Code <- nimbleCode(
{
for (i in 1:N){
O[i] ~ dpois(mu[i]) # Poisson for observed counts
log(mu[i]) <- alpha + u[temp.ord[i]]
}
# intrinsic CAR prior on the effect of temperature
u[1:K] ~ dcar_normal(adj[1:L], weights[1:L], num[1:K], tau.u, zero_mean = 1)
# Priors:
alpha ~ dflat() # Unif(-inf, +inf)
exp.alpha <- exp(alpha) # overall counts across time period
tau.u ~ dgamma(1, 0.01) # precision of the ICAR component
sigma2.u <- 1/tau.u
}
)
# Format the data for NIMBLE in a list
ChicagoData = list(
O = chicagoNMMAPS$death
)
ChicagoConsts <-list(
N = N,
L = length(Wnb$adj),
K = K,
adj = Wnb$adj,
num = Wnb$num,
weights = Wnb$weights,
temp.ord = chicagoNMMAPS$temp_cat_order
)
inits <- list(
list(alpha=0.01,
tau.u=0.1,
u=rep(0.1, times = length(Wnb$num))), # chain 1
list(alpha=2,
tau.u=0.01,
u=rep(0.01, times = length(Wnb$num)))# chain 2
)
params <- c("exp.alpha", "mu", "u", "tau.u",
"alpha")
# MCMC setting
ni <- 50000 # nb iterations
nt <- 10 # thinning interval
nb <- 10000 # nb iterations as burn-in
nc <- 2 # nb chains
nimbleMCMC()
t_0 <- Sys.time()
RW1_Model <- nimbleMCMC(code = RW1_Code,
data = ChicagoData,
constants = ChicagoConsts,
inits = inits,
monitors = params,
niter = ni,
nburnin = nb,
thin = nt,
nchains = nc,
setSeed = 9,
progressBar = FALSE,
samplesAsCodaMCMC = TRUE,
summary = TRUE,
WAIC = TRUE
)
t_1 <- Sys.time()
time_rw1 <- t_1 - t_0
rw1_ggs <- ggs(RW1_Model$samples)
rw1_ggs %>% filter(Parameter == c("u[10]", "u[115]")) %>%
ggs_traceplot() + theme_bw()
rw1_ggs %>% filter(Parameter == c("exp.alpha", "tau.u")) %>%
ggs_traceplot() + theme_bw()
tmp_cat = cut(
chicagoNMMAPS$temp,
breaks = seq(from = min(chicagoNMMAPS$temp),
to = max(chicagoNMMAPS$temp),
length.out = 200),
labels = seq(from = min(chicagoNMMAPS$temp),
to = max(chicagoNMMAPS$temp),
length.out = 200)[-1],
include.lowest = TRUE
)
tmp_cat <- droplevels(tmp_cat)
tmp_cat <-
round(as.numeric(as.character(levels(tmp_cat))), digits = 3)
datRW1 = data.frame(temp = tmp_cat,
median = RW1_Model$summary$all.chains[paste0("u[", 1:180, "]"), "Median"],
LL = RW1_Model$summary$all.chains[paste0("u[", 1:180, "]"), "95%CI_low"],
UL = RW1_Model$summary$all.chains[paste0("u[", 1:180, "]"), "95%CI_upp"])
ggplot() + geom_line(data = datRW1, aes(x = temp, y = exp(median))) +
geom_line(data = datRW1, aes(x = temp, y = exp(LL)), linetype = 2) +
geom_line(data = datRW1, aes(x = temp, y = exp(UL)), linetype = 2) +
theme_bw() + xlab("Temperature") + ylab("Risk") +
geom_hline(yintercept = 1, col = "red", linetype = 2) +
ggtitle("Random walk 1")
Now I will use RW2 to get a smoother effect. The RW2 approximates a thin plate spline. The idea is similar as before, ie to create an ordered variable, nevertheless the weights should be defined differently.
chicagoNMMAPS$temp_cat <- cut(
chicagoNMMAPS$temp,
breaks = seq(from = min(chicagoNMMAPS$temp),
to = max(chicagoNMMAPS$temp),
length.out = 200),
labels = 1:199,
include.lowest = TRUE
)
chicagoNMMAPS$temp_cat_order <-
as.numeric(droplevels(chicagoNMMAPS$temp_cat))
chicagoNMMAPS <- chicagoNMMAPS[order(chicagoNMMAPS$temp_cat_order),]
dcar_normal()
priorK <- max(chicagoNMMAPS$temp_cat_order)
rest.comp <- list()
for(i in 3:(K-2)){
rest.comp[[i]] <- c(i-2, i-1, i+1, i+2)
}
rest.comp <- unlist(rest.comp)
adj = c(2, 3, 1, 3, 4,
rest.comp,
c(K-3, K-2, K, K-2, K-1)
)
num = c(2, 3, rep(4, times = c(K-4)), 3, 2)
weights = c(c(2, -1, 2, 4, -1),
rep(c(-1, 4, 4, -1), times = c(K-4)),
c(-1, 4, 2, -1, 2))
RW2_Code <- nimbleCode(
{
for (i in 1:N){
O[i] ~ dpois(mu[i]) # Poisson for observed counts
log(mu[i]) <- alpha + u[temp.ord[i]]
}
# intrinsic CAR prior on the effect of temperature
u[1:K] ~ dcar_normal(adj = adj[1:L], weights = weights[1:L], num = num[1:K],
tau = tau.u, c = 2, zero_mean = 1) # we dont need sum to zero here
# Priors:
alpha ~ dflat() # Unif(-inf, +inf)
exp.alpha <- exp(alpha) # overall counts across time period
tau.u ~ dgamma(1, 0.01) # precision of the ICAR component
sigma2.u <- 1/tau.u
}
)
# Format the data for NIMBLE in a list
ChicagoData = list(
O = chicagoNMMAPS$death
)
ChicagoConsts <-list(
N = N,
L = length(weights),
K = K,
adj = adj,
num = num,
weights = weights,
temp.ord = chicagoNMMAPS$temp_cat_order
)
inits <- list(
list(alpha=0.01,
tau.u=0.1,
u=rep(0.1, times = K)), # chain 1
list(alpha=2,
tau.u=0.01,
u=rep(0.01, times = K))# chain 2
)
params <- c("exp.alpha", "mu", "u", "tau.u")
# MCMC setting
ni <- 50000 # nb iterations
nt <- 10 # thinning interval
nb <- 10000 # nb iterations as burn-in
nc <- 2 # nb chains
nimbleMCMC()
t_0 <- Sys.time()
RW2_Model <- nimbleMCMC(code = RW2_Code,
data = ChicagoData,
constants = ChicagoConsts,
inits = inits,
monitors = params,
niter = ni,
nburnin = nb,
thin = nt,
nchains = nc,
setSeed = 9,
progressBar = FALSE,
samplesAsCodaMCMC = TRUE,
summary = TRUE,
WAIC = TRUE
)
t_1 <- Sys.time()
time_rw2 <- t_1 - t_0
rw2_ggs <- ggs(RW2_Model$samples)
rw2_ggs %>% filter(Parameter == c("u[10]", "u[99]")) %>%
ggs_traceplot() + theme_bw()
rw2_ggs %>% filter(Parameter == c("exp.alpha", "tau.u")) %>%
ggs_traceplot() + theme_bw()
datRW2 = data.frame(temp = tmp_cat,
median = RW2_Model$summary$all.chains[paste0("u[", 1:180, "]"), "Median"],
LL = RW2_Model$summary$all.chains[paste0("u[", 1:180, "]"), "95%CI_low"],
UL = RW2_Model$summary$all.chains[paste0("u[", 1:180, "]"), "95%CI_upp"])
ggplot() + geom_line(data = datRW2, aes(x = temp, y = exp(median))) +
geom_line(data = datRW2, aes(x = temp, y = exp(LL)), linetype = 2) +
geom_line(data = datRW2, aes(x = temp, y = exp(UL)), linetype = 2) +
theme_bw() + xlab("Temperature") + ylab("Risk") +
geom_hline(yintercept = 1, col = "red", linetype = 2) +
ggtitle("Random walk 2")
# linear trend
dat_linear <- data.frame(temperature = datRW1$temp)
dat_linear$log.count <-
LinearModel$summary$all.chains["alpha", "Median"] +
LinearModel$summary$all.chains["b", "Median"]*datRW1$temp
dat_linear$model <- "linear"
# spline
dat_spline2 <- data.frame(temperature = dat_spline$temp,
log.count = dat_spline$median,
model = "splines"
)
# rw1
dat_rw1 <- data.frame(temperature = chicagoNMMAPS$temp,
log.count = log(RW1_Model$summary$all.chains[paste0("mu[", 1:nrow(chicagoNMMAPS), "]"), "Median"]),
model = "RW1"
)
# rw2
dat_rw2 <- data.frame(temperature = chicagoNMMAPS$temp,
log.count = log(RW2_Model$summary$all.chains[paste0("mu[", 1:nrow(chicagoNMMAPS), "]"), "Median"]),
model = "RW2"
)
dat_plot <- rbind(dat_linear, dat_spline2, dat_rw1, dat_rw2)
ggplot() +
geom_line(data = dat_plot, aes(x = temperature, y = log.count, col = model), size = 1) +
scale_color_viridis_d() +
theme_bw() +
ggtitle("Comparison of models")
# linear trend
dat_linear$log.countLL <-
LinearModel$summary$all.chains["alpha", "95%CI_low"] +
LinearModel$summary$all.chains["b", "95%CI_low"]*datRW1$temp
dat_linear$log.countUL <-
LinearModel$summary$all.chains["alpha", "95%CI_upp"] +
LinearModel$summary$all.chains["b", "95%CI_upp"]*datRW1$temp
# splines
dat_spline2$log.countLL <- dat_spline$LL
dat_spline2$log.countUL <- dat_spline$UL
# rw1
dat_rw1$log.countLL <- log(RW1_Model$summary$all.chains[paste0("mu[", 1:nrow(chicagoNMMAPS), "]"), "95%CI_low"])
dat_rw1$log.countUL <- log(RW1_Model$summary$all.chains[paste0("mu[", 1:nrow(chicagoNMMAPS), "]"), "95%CI_upp"])
# rw2
dat_rw2$log.countLL <- log(RW2_Model$summary$all.chains[paste0("mu[", 1:nrow(chicagoNMMAPS), "]"), "95%CI_low"])
dat_rw2$log.countUL <- log(RW2_Model$summary$all.chains[paste0("mu[", 1:nrow(chicagoNMMAPS), "]"), "95%CI_upp"])
dat_plot_CrI <- rbind(dat_linear, dat_spline2, dat_rw1, dat_rw2)
ggplot(data = dat_plot_CrI) +
geom_ribbon(aes(x = temperature, ymin = log.countLL, ymax = log.countUL, fill = model), alpha = .2) +
geom_line(aes(x = temperature, y = log.count, col = model)) +
facet_wrap(~model, ncol = 2) +
theme_bw() +
scale_color_viridis_d() +
scale_fill_viridis_d() +
ggtitle("Comparison of models") +
theme(text = element_text(size=15))
dat_time = data.frame(
time_linear = time_linear,
time_splines = time_splines,
time_rw1 = time_rw1,
time_rw2 = time_rw2
)
dat_time[1,] <- round(as.numeric(dat_time), digits = 2)
colnames(dat_time) <- c("linear", "splines", "RW1", "RW2")
knitr::kable(dat_time, caption = "Time needed to fit the models") %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "center")
linear | splines | RW1 | RW2 |
---|---|---|---|
5.14 mins | 16.46 mins | 10.4 mins | 10.3 mins |
The time needed to fit the models with flexible temperature effects was similar. It seems that bsplines used here provide a smoother fit compared to the RW2. Of course this, in both cases depend on the corresponding specifications. For instance, had I used more knots in the spline specification, the result would have been less smooth. In a similar fashion, had I used an informative prior for the variance hyperpapameter (that does not allow much variation) in the RW2 I could have achieved higher amount of smoothing. In the end, smoothing is a parameter that should be tuned by the investigator and be problem specific.
de Valpine, P., D. Turek, C. J. Paciorek, C. Anderson-Bergman, D. Temple Lang, and R. Bodik. 2017. “Programming with Models: Writing Statistical Algorithms for General Model Structures with NIMBLE.” Journal of Computational and Graphical Statistics 26: 403–17. https://doi.org/10.1080/10618600.2016.1172487.