In this document we will show how to use splines and random walks for flexible fits using NIMBLE (de Valpine et al. 2017). We will use the Chicago dataset as it can be found in the dlnm package. Using typing help(chicagoNMMAPS) you will find the following information about the dataset:

The data set contains daily mortality (all causes, CVD, respiratory), weather (temperature, dew point temperature, relative humidity) and pollution data (PM10 and ozone) for Chicago in the period 1987-2000 from the National Morbidity, Mortality and Air Pollution Study (NMMAPS).

The purpose of the tutorial is to assess the effect of temperature on the number of deaths of all causes, using the Chicago dataset. We will start fitting a linear effect and then to allow more flexible fits, we will consider splines, RW1 and RW2.

Install and load packages

This practical requires the following packages to be installed and attached: sf, dplyr, tidyverse, nimble, coda, spdep, patchwork, GGally, ggmcmc, kableExtra, splines2 and dlnm.

  • To install the entire suite of packages, we can use:
install.packages(c("sf", "dplyr", "tidyverse", "nimble", "coda", "spdep", "patchwork", 
                   "GGally", "ggmcmc", "dlnm", "splines2", "kableExtra"), dependencies = TRUE, repos = "http://cran.r-project.org")
  • Then, load the needed packages:
library(sf)           # Simple feature for R
library(dplyr)        # A package for data manipulation
library(tidyverse)    # A package for data manipulation and plotting
library(nimble)       # A package for performing MCMC in R
library(coda)         # A package for summarizing and plotting of MCMC output 
                      # and diagnostic tests
library(spdep)        # A package to calculate neighbors
library(patchwork)    # A package to combine plots
library(GGally)       # A package to do pairplots
library(ggmcmc)       # MCMC diagnostics with ggplot
library(dlnm)         # A package for performing dlnms in R
library(splines2)     # A package to fit splines in R
library(kableExtra)   # packages for formatting tables in rmarkdown

Import and explore the Chicago dataset

  • Import the health data
head(chicagoNMMAPS, 10)
##          date time year month doy       dow death cvd resp       temp   dptp
## 1  1987-01-01    1 1987     1   1  Thursday   130  65   13 -0.2777778 31.500
## 2  1987-01-02    2 1987     1   2    Friday   150  73   14  0.5555556 29.875
## 3  1987-01-03    3 1987     1   3  Saturday   101  43   11  0.5555556 27.375
## 4  1987-01-04    4 1987     1   4    Sunday   135  72    7 -1.6666667 28.625
## 5  1987-01-05    5 1987     1   5    Monday   126  64   12  0.0000000 28.875
## 6  1987-01-06    6 1987     1   6   Tuesday   130  63   12  4.4444444 35.125
## 7  1987-01-07    7 1987     1   7 Wednesday   129  72   12  1.3888889 26.750
## 8  1987-01-08    8 1987     1   8  Thursday   109  51   13 -1.6666667 22.000
## 9  1987-01-09    9 1987     1   9    Friday   125  62    7 -3.0555556 29.000
## 10 1987-01-10   10 1987     1  10  Saturday   153  90   11  0.2777778 27.750
##      rhum     pm10        o3
## 1  95.500 26.95607  4.376079
## 2  88.250       NA  4.929803
## 3  89.500 32.83869  3.751079
## 4  84.500 39.95607  4.292746
## 5  74.500       NA  4.751079
## 6  77.375 40.95607  6.334412
## 7  74.500 33.95607  8.594019
## 8  77.875 28.95607 11.797890
## 9  95.125 32.34877  3.876079
## 10 81.875       NA  5.388137

Please notice that there are also missing values.

  • Calculate the number of missing values per row
summary(chicagoNMMAPS)
##       date                 time           year          month       
##  Min.   :1987-01-01   Min.   :   1   Min.   :1987   Min.   : 1.000  
##  1st Qu.:1990-07-02   1st Qu.:1279   1st Qu.:1990   1st Qu.: 4.000  
##  Median :1993-12-31   Median :2558   Median :1994   Median : 7.000  
##  Mean   :1993-12-31   Mean   :2558   Mean   :1994   Mean   : 6.522  
##  3rd Qu.:1997-07-01   3rd Qu.:3836   3rd Qu.:1997   3rd Qu.:10.000  
##  Max.   :2000-12-31   Max.   :5114   Max.   :2000   Max.   :12.000  
##                                                                     
##       doy               dow          death            cvd       
##  Min.   :  1.0   Sunday   :731   Min.   : 69.0   Min.   : 25.0  
##  1st Qu.: 92.0   Monday   :730   1st Qu.:105.0   1st Qu.: 44.0  
##  Median :183.0   Tuesday  :730   Median :114.0   Median : 50.0  
##  Mean   :183.1   Wednesday:730   Mean   :115.4   Mean   : 50.9  
##  3rd Qu.:274.0   Thursday :731   3rd Qu.:124.0   3rd Qu.: 57.0  
##  Max.   :366.0   Friday   :731   Max.   :411.0   Max.   :312.0  
##                  Saturday :731                                  
##       resp             temp              dptp             rhum       
##  Min.   : 0.000   Min.   :-26.667   Min.   :-25.62   Min.   : 25.62  
##  1st Qu.: 7.000   1st Qu.:  1.667   1st Qu.: 27.25   1st Qu.: 58.12  
##  Median : 9.000   Median : 10.556   Median : 39.75   Median : 69.75  
##  Mean   : 9.178   Mean   : 10.107   Mean   : 40.41   Mean   : 69.17  
##  3rd Qu.:11.000   3rd Qu.: 19.444   3rd Qu.: 55.75   3rd Qu.: 80.38  
##  Max.   :34.000   Max.   : 33.333   Max.   : 78.25   Max.   :100.00  
##                                                      NA's   :1096    
##       pm10              o3        
##  Min.   : -3.05   Min.   :-0.832  
##  1st Qu.: 20.77   1st Qu.:11.960  
##  Median : 30.25   Median :19.045  
##  Mean   : 33.74   Mean   :20.143  
##  3rd Qu.: 42.42   3rd Qu.:26.624  
##  Max.   :356.18   Max.   :65.808  
##  NA's   :251

There are 1096 missing values for the mean daily relative humidity and 251 for PM\(_10\).

  • Some plots for the pollutants
ggplot() + geom_point(data = chicagoNMMAPS, aes(x = time, y = rhum), size = .5) + theme_bw() + 
           theme(text = element_text(size = 10)) + ggtitle("Relative humidity") +
           ylab("") -> p1

ggplot() + geom_point(data = chicagoNMMAPS, aes(x = time, y = pm10), size = .5) + theme_bw() + 
           theme(text = element_text(size = 10)) + ggtitle("PM10") +
           ylab("") -> p2

ggplot() + geom_point(data = chicagoNMMAPS, aes(x = time, y = temp), size = .5) + theme_bw() + 
           theme(text = element_text(size = 10)) + ggtitle("Temperature") +
           ylab("") -> p3

ggplot() + geom_point(data = chicagoNMMAPS, aes(x = time, y = o3), size = .5) + theme_bw() + 
           theme(text = element_text(size = 10)) + ggtitle("O3") +
           ylab("") -> p4

(p1|p2)/(p3|p4)

There is a clear seasonality for all of the variables assessed.

Linear effect

At this section, we will examine the effect of temperature on all cause mortality.

  • Nimble model
LinearCode <- nimbleCode(
  {
    for (i in 1:N){
      
      O[i] ~ dpois(mu[i])                                # Poisson for observed counts 
      log(mu[i]) <- alpha + b*temperature[i]
  
    } 
    
    # Priors:
    alpha ~ dflat()                                      # Unif(-inf, +inf)
    exp.alpha <- exp(alpha)                              # overall counts across time period
    
    b ~ dnorm(0, tau = 5)
    b.exp <- exp(b)
  }
)
  • Data objects:
N <- dim(chicagoNMMAPS)[1] 
# Format the data for NIMBLE in a list
ChicagoData = list(
                 O = chicagoNMMAPS$death,                  # observed nb of deaths
                 temperature = chicagoNMMAPS$temp          # daily temperature
)
ChicagoConsts <-list(
                 N = N                                     # nb of time points
)
  • Initialize the parameters:
# Initialise the unknown parameters, 2 chains
inits <- list(
  list(alpha=0, b=0),  # chain 1
  list(alpha=-2, b=1))   # chain 2
  • Set the parameters that will be monitored:
params <- c("b.exp", "exp.alpha", "mu", "alpha", "b")

Note that the parameters that are not set, will NOT be monitored!

  • Specify the MCMC setting:
# MCMC setting
ni <- 50000  # nb iterations 
nt <- 10     # thinning interval
nb <- 10000   # nb iterations as burn-in 
nc <- 2      # nb chains

The burn-in should be long enough to discard the initial part of the Markov chains that have not yet converged to the stationary distribution.

  • Run the MCMC simulations calling Nimble from R using the function nimbleMCMC()
t_0 <- Sys.time()
LinearModel <- nimbleMCMC(code = LinearCode,
                      data = ChicagoData,
                      constants = ChicagoConsts, 
                      inits = inits,
                      monitors = params,
                      niter = ni,
                      nburnin = nb,
                      thin = nt, 
                      nchains = nc, 
                      setSeed = 9, 
                      progressBar = FALSE,     
                      samplesAsCodaMCMC = TRUE, 
                      summary = TRUE, 
                      WAIC = TRUE
                      )
t_1 <- Sys.time()
time_linear <- t_1 - t_0
linear_ggs <- ggs(LinearModel$samples)

linear_ggs %>% filter(Parameter == c("alpha", "b")) %>% 
  ggs_traceplot() + theme_bw()

  • Check the results
tab <- 
round(LinearModel$summary$all.chains[c("exp.alpha", "b.exp"),], 
      digits = 3)

knitr::kable(tab, caption = "Median and 95%CrI for the covariate coefficients of the linear model.") %>%  kable_styling(bootstrap_options = "striped", full_width = F, position = "center")
Median and 95%CrI for the covariate coefficients of the linear model.
Mean Median St.Dev. 95%CI_low 95%CI_upp
exp.alpha 120.636 120.637 0.209 120.231 121.039
b.exp 0.996 0.996 0.000 0.995 0.996

Splines

  • Define the matrix of the cubic splines:
# define the knots
knots <- c(0, 10, 20)
# and the matrix
bsMat <- bSpline(chicagoNMMAPS$temp, 
                 knots = knots, 
                 degree = 3, 
                 intercept = TRUE)
  • Nimble model
SplinesCode <- nimbleCode(
  {
    for (i in 1:N){
      
      O[i] ~ dpois(mu[i])                                # Poisson for observed counts 
      log(mu[i]) <- inprod(beta[], X[i,])
  
    } 
    
    # Priors:
    # alpha ~ dflat()                                      # Unif(-inf, +inf)
    # exp.alpha <- exp(alpha)                              # overall counts across time period
    
    for(j in 1:7){
        beta[j] ~ dnorm(0, tau = 5)
      exp.beta[j] <- exp(beta[j])                        # the coefficients of the splines
    }
  }
)
  • Data objects:
N <- dim(chicagoNMMAPS)[1] 
# Format the data for NIMBLE in a list
ChicagoData = list(
                 O = chicagoNMMAPS$death,                  # observed nb of deaths
                 X = bsMat                                 # the functional form of temperature
)
ChicagoConsts <-list(
                 N = N                                     # nb of time points
)
  • Initialize the parameters:
# Initialise the unknown parameters, 2 chains
inits <- list(
  list(beta=rep(0, 7)),  # chain 1
  list(beta=rep(1, 7))# chain 2
)
  • Set the parameters that will be monitored:
params <- c("mu", paste0("beta[", 1:7, "]"), 
            paste0("exp.beta[", 1:7, "]"))
  • Specify the MCMC setting:
# MCMC setting
ni <- 50000  # nb iterations 
nt <- 10     # thinning interval
nb <- 10000   # nb iterations as burn-in 
nc <- 2      # nb chains
  • Run the MCMC simulations calling Nimble from R using the function nimbleMCMC()
t_0 <- Sys.time()
SplinesModel <- nimbleMCMC(code = SplinesCode,
                      data = ChicagoData,
                      constants = ChicagoConsts, 
                      inits = inits,
                      monitors = params,
                      niter = ni,
                      nburnin = nb,
                      thin = nt, 
                      nchains = nc, 
                      setSeed = 9, 
                      progressBar = FALSE,      
                      samplesAsCodaMCMC = TRUE, 
                      summary = TRUE, 
                      WAIC = TRUE
                      )
t_1 <- Sys.time()
time_splines <- t_1 - t_0
  • Check convergence
splines_ggs <- ggs(SplinesModel$samples)

splines_ggs %>% filter(Parameter == c(paste0("beta[", 1:7, "]"))) %>% 
  ggs_traceplot() + theme_bw()

  • Extract the result
round(
SplinesModel$summary$all.chains[paste0("exp.beta[", 1:7, "]"),],
digits = 2
)
##               Mean Median St.Dev. 95%CI_low 95%CI_upp
## exp.beta[1] 131.76 131.70    5.44    121.44    142.56
## exp.beta[2] 123.55 123.54    2.66    118.37    128.78
## exp.beta[3] 131.61 131.61    1.39    128.95    134.36
## exp.beta[4] 112.20 112.20    0.66    110.95    113.52
## exp.beta[5] 107.66 107.65    0.94    105.82    109.52
## exp.beta[6] 106.49 106.50    1.41    103.76    109.32
## exp.beta[7] 149.97 149.95    4.39    141.57    158.68
# try to plot it
chicagoNMMAPS$tmp_spline <- 
bsMat %*% SplinesModel$summary$all.chains[c(paste0("beta[", 1:7, "]")), "Median"]

# spline
chicagoNMMAPS$log.countLL <-  bsMat %*% SplinesModel$summary$all.chains[c(paste0("beta[", 1:7, "]")), "95%CI_low"]
chicagoNMMAPS$log.countUL <- bsMat %*% SplinesModel$summary$all.chains[c(paste0("beta[", 1:7, "]")), "95%CI_upp"]


dat_spline = data.frame(temp = chicagoNMMAPS$temp, 
                 median = chicagoNMMAPS$tmp_spline, 
                 LL = chicagoNMMAPS$log.countLL, 
                 UL = chicagoNMMAPS$log.countUL)


ggplot() + geom_line(data = dat_spline, aes(x = temp, y = exp(median))) + 
  geom_line(data = dat_spline, aes(x = temp, y = exp(LL)), linetype = 2) + 
  geom_line(data = dat_spline, aes(x = temp, y = exp(UL)), linetype = 2) + 
  theme_bw() + xlab("Temperature") + ylab("Number of cases") + 
  ggtitle("Splines")

Random Walk 1

Now I will allow a more flexible fit using the random walk process. The idea is to create an ordered variable by categorizing the continuous variable of interest (i.e. temperature). The ordered variable is created because based on this, we will get the neighboring structure required for the RW1 process. We can sample using dcar_normal() and weights in a similar fashion as the geographical weights in previous tutorials.

  • Define the ordered variable for RW.
chicagoNMMAPS$temp_cat <- cut(
  chicagoNMMAPS$temp, 
  breaks = seq(from = min(chicagoNMMAPS$temp), 
               to  = max(chicagoNMMAPS$temp), 
               length.out = 200),
  labels = 1:199,
  include.lowest = TRUE
)

chicagoNMMAPS$temp_cat_order <-
  as.numeric(droplevels(chicagoNMMAPS$temp_cat))

chicagoNMMAPS <- chicagoNMMAPS[order(chicagoNMMAPS$temp_cat_order),]
  • Define the weights for the covariance matrix.
K <- max(chicagoNMMAPS$temp_cat_order)
W <- matrix(0, nrow = K, ncol = K)

for(i in 1:(K-1)) W[i,i+1] <- 1
for(i in 1:(K-1)) W[i+1,i] <- 1

Wnb <- mat2listw(W)
Wnb <- nb2WB(nb = Wnb$neighbours)
  • Nimble model.
RW1_Code <- nimbleCode(
  {
    for (i in 1:N){
      
      O[i] ~ dpois(mu[i])                                # Poisson for observed counts 
      log(mu[i]) <- alpha + u[temp.ord[i]]
    } 
    
   # intrinsic CAR prior on the effect of temperature
   u[1:K] ~ dcar_normal(adj[1:L], weights[1:L], num[1:K], tau.u, zero_mean = 1)

   # Priors:
   alpha ~ dflat()                                      # Unif(-inf, +inf)
   exp.alpha <- exp(alpha)                              # overall counts across time period
   
   tau.u ~ dgamma(1, 0.01)                              # precision of the ICAR component
   sigma2.u <- 1/tau.u  
  }
)
  • Data objects:
# Format the data for NIMBLE in a list
ChicagoData = list(
                 O = chicagoNMMAPS$death                   
                 
)
ChicagoConsts <-list(
                 N = N,  
                 L = length(Wnb$adj),
                 K = K,
                 adj = Wnb$adj,                             
                 num = Wnb$num,
                 weights = Wnb$weights, 
                 temp.ord = chicagoNMMAPS$temp_cat_order
)
  • Initialize the parameters:
inits <- list(
  list(alpha=0.01, 
       tau.u=0.1,
       u=rep(0.1, times = length(Wnb$num))),  # chain 1
  list(alpha=2,
       tau.u=0.01,
       u=rep(0.01, times = length(Wnb$num)))# chain 2
)
  • Set the parameters that will be monitored:
params <- c("exp.alpha", "mu", "u", "tau.u", 
            "alpha")
  • Specify the MCMC setting:
# MCMC setting
ni <- 50000  # nb iterations 
nt <- 10     # thinning interval
nb <- 10000   # nb iterations as burn-in 
nc <- 2      # nb chains
  • Run the MCMC simulations calling Nimble from R using the function nimbleMCMC()
t_0 <- Sys.time()
RW1_Model <- nimbleMCMC(code = RW1_Code,
                      data = ChicagoData,
                      constants = ChicagoConsts, 
                      inits = inits,
                      monitors = params,
                      niter = ni,
                      nburnin = nb,
                      thin = nt, 
                      nchains = nc, 
                      setSeed = 9, 
                      progressBar = FALSE,      
                      samplesAsCodaMCMC = TRUE, 
                      summary = TRUE, 
                      WAIC = TRUE
                      )
t_1 <- Sys.time()
time_rw1 <- t_1 - t_0
  • Check convergence
rw1_ggs <- ggs(RW1_Model$samples)

rw1_ggs %>% filter(Parameter == c("u[10]", "u[115]")) %>% 
  ggs_traceplot() + theme_bw()

rw1_ggs %>% filter(Parameter == c("exp.alpha", "tau.u")) %>% 
  ggs_traceplot() + theme_bw()

  • Extract the results
tmp_cat = cut(
  chicagoNMMAPS$temp, 
  breaks = seq(from = min(chicagoNMMAPS$temp), 
               to  = max(chicagoNMMAPS$temp), 
               length.out = 200),
  labels = seq(from = min(chicagoNMMAPS$temp), 
               to  = max(chicagoNMMAPS$temp), 
               length.out = 200)[-1],
  include.lowest = TRUE
)

tmp_cat <- droplevels(tmp_cat)
tmp_cat <- 
round(as.numeric(as.character(levels(tmp_cat))), digits = 3)

datRW1 = data.frame(temp = tmp_cat, 
                 median = RW1_Model$summary$all.chains[paste0("u[", 1:180, "]"), "Median"], 
                 LL = RW1_Model$summary$all.chains[paste0("u[", 1:180, "]"), "95%CI_low"], 
                 UL = RW1_Model$summary$all.chains[paste0("u[", 1:180, "]"), "95%CI_upp"])

ggplot() + geom_line(data = datRW1, aes(x = temp, y = exp(median))) + 
  geom_line(data = datRW1, aes(x = temp, y = exp(LL)), linetype = 2) + 
  geom_line(data = datRW1, aes(x = temp, y = exp(UL)), linetype = 2) + 
  theme_bw() + xlab("Temperature") + ylab("Risk") + 
  geom_hline(yintercept = 1, col = "red", linetype = 2) + 
  ggtitle("Random walk 1")

Random Walk 2

Now I will use RW2 to get a smoother effect. The RW2 approximates a thin plate spline. The idea is similar as before, ie to create an ordered variable, nevertheless the weights should be defined differently.

  • Define the variable for RW.
chicagoNMMAPS$temp_cat <- cut(
  chicagoNMMAPS$temp, 
  breaks = seq(from = min(chicagoNMMAPS$temp), 
               to  = max(chicagoNMMAPS$temp), 
               length.out = 200),
  labels = 1:199,
  include.lowest = TRUE
)


chicagoNMMAPS$temp_cat_order <-
  as.numeric(droplevels(chicagoNMMAPS$temp_cat))

chicagoNMMAPS <- chicagoNMMAPS[order(chicagoNMMAPS$temp_cat_order),]
  • Define the inputs of dcar_normal() prior
K <- max(chicagoNMMAPS$temp_cat_order)

rest.comp <- list()
for(i in 3:(K-2)){
  rest.comp[[i]] <- c(i-2, i-1, i+1, i+2)
}
rest.comp <- unlist(rest.comp)

adj = c(2, 3, 1, 3, 4, 
        rest.comp, 
        c(K-3, K-2, K, K-2, K-1)
        )

num = c(2, 3, rep(4, times = c(K-4)), 3, 2)

weights = c(c(2, -1, 2, 4, -1), 
            rep(c(-1, 4, 4, -1), times = c(K-4)),
            c(-1, 4, 2, -1, 2))
  • Nimble model
RW2_Code <- nimbleCode(
  {
    for (i in 1:N){
      
      O[i] ~ dpois(mu[i])                                # Poisson for observed counts 
      log(mu[i]) <- alpha + u[temp.ord[i]]
    } 
    
   # intrinsic CAR prior on the effect of temperature
   u[1:K] ~ dcar_normal(adj = adj[1:L], weights  = weights[1:L], num = num[1:K], 
                        tau = tau.u, c = 2, zero_mean = 1) # we dont need sum to zero here

   # Priors:
   alpha ~ dflat()                                      # Unif(-inf, +inf)
   exp.alpha <- exp(alpha)                              # overall counts across time period
   
   tau.u ~ dgamma(1, 0.01)                              # precision of the ICAR component
   sigma2.u <- 1/tau.u  
  }
)
  • Data objects:
# Format the data for NIMBLE in a list
ChicagoData = list(
                 O = chicagoNMMAPS$death                   
                 
)
ChicagoConsts <-list(
                 N = N,  
                 L = length(weights),
                 K = K,
                 adj = adj,                             
                 num = num,
                 weights = weights, 
                 temp.ord = chicagoNMMAPS$temp_cat_order
)
  • Initialize the parameters:
inits <- list(
  list(alpha=0.01, 
       tau.u=0.1,
       u=rep(0.1, times = K)),  # chain 1
  list(alpha=2,
       tau.u=0.01,
       u=rep(0.01, times = K))# chain 2
)
  • Set the parameters that will be monitored:
params <- c("exp.alpha", "mu", "u", "tau.u")
  • Specify the MCMC setting:
# MCMC setting
ni <- 50000  # nb iterations 
nt <- 10     # thinning interval
nb <- 10000   # nb iterations as burn-in 
nc <- 2      # nb chains
  • Run the MCMC simulations calling Nimble from R using the function nimbleMCMC()
t_0 <- Sys.time()
RW2_Model <- nimbleMCMC(code = RW2_Code,
                      data = ChicagoData,
                      constants = ChicagoConsts, 
                      inits = inits,
                      monitors = params,
                      niter = ni,
                      nburnin = nb,
                      thin = nt, 
                      nchains = nc, 
                      setSeed = 9, 
                      progressBar = FALSE,      
                      samplesAsCodaMCMC = TRUE, 
                      summary = TRUE, 
                      WAIC = TRUE
                      )
t_1 <- Sys.time()
time_rw2 <- t_1 - t_0
  • Check convergence
rw2_ggs <- ggs(RW2_Model$samples)

rw2_ggs %>% filter(Parameter == c("u[10]", "u[99]")) %>% 
  ggs_traceplot() + theme_bw()

rw2_ggs %>% filter(Parameter == c("exp.alpha", "tau.u")) %>% 
  ggs_traceplot() + theme_bw()

datRW2 = data.frame(temp = tmp_cat, 
                 median = RW2_Model$summary$all.chains[paste0("u[", 1:180, "]"), "Median"], 
                 LL = RW2_Model$summary$all.chains[paste0("u[", 1:180, "]"), "95%CI_low"], 
                 UL = RW2_Model$summary$all.chains[paste0("u[", 1:180, "]"), "95%CI_upp"])

ggplot() + geom_line(data = datRW2, aes(x = temp, y = exp(median))) + 
  geom_line(data = datRW2, aes(x = temp, y = exp(LL)), linetype = 2) + 
  geom_line(data = datRW2, aes(x = temp, y = exp(UL)), linetype = 2) + 
  theme_bw() + xlab("Temperature") + ylab("Risk") + 
  geom_hline(yintercept = 1, col = "red", linetype = 2) + 
  ggtitle("Random walk 2")

Comparison of the curves

  • Compare the medians
# linear trend
dat_linear <- data.frame(temperature = datRW1$temp)
dat_linear$log.count <- 
LinearModel$summary$all.chains["alpha", "Median"] +
LinearModel$summary$all.chains["b", "Median"]*datRW1$temp
dat_linear$model <- "linear"

# spline
dat_spline2 <- data.frame(temperature = dat_spline$temp, 
                         log.count = dat_spline$median,
                         model = "splines"
                           )


# rw1
dat_rw1 <- data.frame(temperature = chicagoNMMAPS$temp, 
                      log.count = log(RW1_Model$summary$all.chains[paste0("mu[", 1:nrow(chicagoNMMAPS), "]"), "Median"]), 
                      model = "RW1"
      )

# rw2
dat_rw2 <- data.frame(temperature = chicagoNMMAPS$temp, 
                      log.count = log(RW2_Model$summary$all.chains[paste0("mu[", 1:nrow(chicagoNMMAPS), "]"), "Median"]), 
                      model = "RW2"
      )

dat_plot <- rbind(dat_linear, dat_spline2, dat_rw1, dat_rw2)


ggplot() + 
  geom_line(data = dat_plot, aes(x = temperature, y = log.count, col = model), size = 1) + 
  scale_color_viridis_d() + 
  theme_bw() +
  ggtitle("Comparison of models")

  • Get the uncertainties too
# linear trend
dat_linear$log.countLL <- 
LinearModel$summary$all.chains["alpha", "95%CI_low"] +
LinearModel$summary$all.chains["b", "95%CI_low"]*datRW1$temp

dat_linear$log.countUL <- 
LinearModel$summary$all.chains["alpha", "95%CI_upp"] +
LinearModel$summary$all.chains["b", "95%CI_upp"]*datRW1$temp

# splines
dat_spline2$log.countLL <- dat_spline$LL
dat_spline2$log.countUL <- dat_spline$UL

# rw1
dat_rw1$log.countLL <- log(RW1_Model$summary$all.chains[paste0("mu[", 1:nrow(chicagoNMMAPS), "]"), "95%CI_low"])
dat_rw1$log.countUL <- log(RW1_Model$summary$all.chains[paste0("mu[", 1:nrow(chicagoNMMAPS), "]"), "95%CI_upp"])


# rw2
dat_rw2$log.countLL <- log(RW2_Model$summary$all.chains[paste0("mu[", 1:nrow(chicagoNMMAPS), "]"), "95%CI_low"])
dat_rw2$log.countUL <- log(RW2_Model$summary$all.chains[paste0("mu[", 1:nrow(chicagoNMMAPS), "]"), "95%CI_upp"])


dat_plot_CrI <- rbind(dat_linear, dat_spline2, dat_rw1, dat_rw2)


ggplot(data = dat_plot_CrI) + 
    geom_ribbon(aes(x = temperature, ymin = log.countLL, ymax = log.countUL, fill = model), alpha = .2) + 
   geom_line(aes(x = temperature, y = log.count, col = model)) +
  facet_wrap(~model, ncol = 2) +
  theme_bw() +
  scale_color_viridis_d() + 
  scale_fill_viridis_d() +
  ggtitle("Comparison of models") +
  theme(text = element_text(size=15))

  • Compare time
dat_time = data.frame(
  time_linear = time_linear, 
  time_splines = time_splines, 
  time_rw1 = time_rw1, 
  time_rw2 = time_rw2
)
dat_time[1,] <- round(as.numeric(dat_time), digits = 2)
colnames(dat_time) <- c("linear", "splines", "RW1", "RW2")

knitr::kable(dat_time, caption = "Time needed to fit the models") %>%  kable_styling(bootstrap_options = "striped", full_width = F, position = "center")
Time needed to fit the models
linear splines RW1 RW2
5.14 mins 16.46 mins 10.4 mins 10.3 mins

Conclusion

The time needed to fit the models with flexible temperature effects was similar. It seems that bsplines used here provide a smoother fit compared to the RW2. Of course this, in both cases depend on the corresponding specifications. For instance, had I used more knots in the spline specification, the result would have been less smooth. In a similar fashion, had I used an informative prior for the variance hyperpapameter (that does not allow much variation) in the RW2 I could have achieved higher amount of smoothing. In the end, smoothing is a parameter that should be tuned by the investigator and be problem specific.

References

de Valpine, P., D. Turek, C. J. Paciorek, C. Anderson-Bergman, D. Temple Lang, and R. Bodik. 2017. “Programming with Models: Writing Statistical Algorithms for General Model Structures with NIMBLE.” Journal of Computational and Graphical Statistics 26: 403–17. https://doi.org/10.1080/10618600.2016.1172487.