Here we will use the glmmfields package to fit a spatial GLM with a predictor. While glmmfields was designed to fit spatiotemporal GLMs with the possibility of extreme events, it can also be used to fit regular spatial GLMs without a time element and without extreme events. Currently it can fit Gaussian (link = identity), Gamma (link = log), Poisson (link = log), negative binomial (link = log), and binomial (link = logit) models. The package can also fit lognormal (link = log) models.
Let’s load the necessary packages:
Set up parallel processing (not used in this example):
First, let’s simulate some data. We will use the built-in function
sim_glmmfields()
, but normally you would start with your
own data. We will simulate 200 data points, some (fake) temperature
data, an underlying random field spatial pattern, and add some
observation error. In this example we will fit a Gamma GLM with a log
link.
The underlying intercept is 0.5 and the slope between temperature and our observed variable (say biomass or density of individuals in a quadrat) is 0.2.
set.seed(1)
N <- 200 # number of data points
temperature <- rnorm(N, 0, 1) # simulated temperature data
X <- cbind(1, temperature) # our design matrix
s <- sim_glmmfields(
n_draws = 1, gp_theta = 1.5, n_data_points = N,
gp_sigma = 0.2, sd_obs = 0.2, n_knots = 12, obs_error = "gamma",
covariance = "squared-exponential", X = X,
B = c(0.5, 0.2) # B represents our intercept and slope
)
d <- s$dat
d$temperature <- temperature
ggplot(s$dat, aes(lon, lat, colour = y)) +
viridis::scale_colour_viridis() +
geom_point(size = 3)
If we fit a regular GLM we can see that there is spatial autocorrelation in the residuals:
##
## Call: glm(formula = y ~ temperature, family = Gamma(link = "log"),
## data = d)
##
## Coefficients:
## (Intercept) temperature
## 0.5369 0.1967
##
## Degrees of Freedom: 199 Total (i.e. Null); 198 Residual
## Null Deviance: 23.27
## Residual Deviance: 16.72 AIC: 280.9
## 2.5 % 97.5 %
## (Intercept) 0.4964271 0.5780115
## temperature 0.1522553 0.2413277
d$m_glm_residuals <- residuals(m_glm)
ggplot(d, aes(lon, lat, colour = m_glm_residuals)) +
scale_color_gradient2() +
geom_point(size = 3)
Let’s instead fit a spatial GLM with random fields. Note that we are only using 1 chain and 500 iterations here so the vignette builds quickly on CRAN. For final inference, you should likely use 4 or more chains and 2000 or more iterations.
m_spatial <- glmmfields(y ~ temperature,
data = d, family = Gamma(link = "log"),
lat = "lat", lon = "lon", nknots = 12, iter = 500, chains = 1,
prior_intercept = student_t(3, 0, 10),
prior_beta = student_t(3, 0, 3),
prior_sigma = half_t(3, 0, 3),
prior_gp_theta = half_t(3, 0, 10),
prior_gp_sigma = half_t(3, 0, 3),
seed = 123 # passed to rstan::sampling()
)
## Warning: The largest R-hat is 1.05, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#r-hat
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#tail-ess
Let’s look at the model output:
## Inference for Stan model: glmmfields.
## 1 chains, each with iter=500; warmup=250; thin=1;
## post-warmup draws per chain=250, total post-warmup draws=250.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## gp_sigma 0.27 0.01 0.07 0.16 0.22 0.26 0.31 0.42 101 1.01
## gp_theta 1.69 0.02 0.22 1.34 1.54 1.66 1.83 2.21 155 1.01
## B[1] 0.48 0.01 0.07 0.35 0.44 0.48 0.52 0.61 79 1.00
## B[2] 0.19 0.00 0.02 0.16 0.18 0.19 0.20 0.22 170 1.02
## CV[1] 0.21 0.00 0.01 0.19 0.20 0.21 0.21 0.23 123 1.01
## lp__ -62.21 0.23 2.74 -68.49 -63.99 -61.92 -60.12 -58.12 144 1.00
##
## Samples were drawn using NUTS(diag_e) at Fri Oct 20 10:26:42 2023.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
We can see that the 95% credible intervals are considerably wider on the intercept term and narrower on the slope coefficient in the spatial GLM vs. the model that ignored the spatial autocorrelation.
Let’s look at the residuals in space this time:
That looks better.
We can inspect the residuals versus fitted values:
And the predictions from our model itself:
plot(m_spatial, type = "prediction", link = FALSE) +
viridis::scale_colour_viridis() +
geom_point(size = 3)
Compare that to our data at the top. Note that the original data also includes observation error with a CV of 0.2.
We can also extract the predictions themselves with credible intervals:
## # A tibble: 6 × 3
## estimate conf_low conf_high
## <dbl> <dbl> <dbl>
## 1 0.723 0.635 0.807
## 2 0.570 0.457 0.667
## 3 0.182 0.0586 0.329
## 4 0.927 0.792 1.05
## 5 0.619 0.516 0.722
## 6 0.495 0.388 0.614
## # A tibble: 6 × 3
## estimate conf_low conf_high
## <dbl> <dbl> <dbl>
## 1 2.06 1.89 2.24
## 2 1.77 1.58 1.95
## 3 1.20 1.06 1.39
## 4 2.53 2.21 2.85
## 5 1.86 1.68 2.06
## 6 1.64 1.47 1.85
# prediction intervals on new observations (include observation error):
p <- predict(m_spatial, type = "response", interval = "prediction")
head(p)
## # A tibble: 6 × 3
## estimate conf_low conf_high
## <dbl> <dbl> <dbl>
## 1 2.06 1.39 2.96
## 2 1.77 0.983 2.62
## 3 1.20 0.755 1.77
## 4 2.53 1.51 3.73
## 5 1.86 1.18 2.63
## 6 1.64 0.975 2.46
Or use the tidy
method to get our parameter estimates as
a data frame:
## # A tibble: 6 × 5
## term estimate std.error conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 gp_sigma 0.261 0.0733 0.155 0.416
## 2 gp_theta 1.66 0.221 1.36 2.21
## 3 B[1] 0.479 0.0653 0.336 0.614
## 4 B[2] 0.193 0.0153 0.160 0.219
## 5 CV[1] 0.206 0.0116 0.183 0.227
## 6 spatialEffectsKnots[1,1] 0.333 0.0809 0.143 0.478
Or make the predictions on a fine-scale spatial grid for a constant value of the predictor:
pred_grid <- expand.grid(
lat = seq(min(d$lat), max(d$lat), length.out = 30),
lon = seq(min(d$lon), max(d$lon), length.out = 30)
)
pred_grid$temperature <- mean(d$temperature)
pred_grid$prediction <- predict(
m_spatial,
newdata = pred_grid,
type = "response"
)$estimate
ggplot(pred_grid, aes(lon, lat, fill = prediction)) +
geom_raster() +
viridis::scale_fill_viridis()