Introduction to spotr

Preparation

First load some packages needed for fitting models and producing plots.

library(spotr)
library(mgcv)
library(brms)
library(sf)
library(ggplot2)
library(dplyr)

Next load the data for the case study. The data are distributed with spotr and consists of 20 years of monitoring data of the common cuckoo in Sweden.

data(cuckoo)
head(cuckoo)
##   count   yr           route county     lon     lat
## 1     4 2000 SFTstd:siteId:1  Skåne 13.2294 55.5649
## 2     2 2001 SFTstd:siteId:1  Skåne 13.2294 55.5649
## 3     3 2002 SFTstd:siteId:1  Skåne 13.2294 55.5649
## 4     1 2003 SFTstd:siteId:1  Skåne 13.2294 55.5649
## 5     4 2005 SFTstd:siteId:1  Skåne 13.2294 55.5649
## 6    10 2006 SFTstd:siteId:1  Skåne 13.2294 55.5649

Transform the data into an sf object for plotting and for spatial manipulation. Also load the map of Sweden contained in the package.

cuckoo = st_as_sf(cuckoo, coords = c("lon", "lat"), crs = st_crs(4326), remove = FALSE)
data(swe_map)

Plot the data for an example year

ggplot(swe_map) +  geom_sf() + geom_sf(aes(col = count), data = subset(cuckoo, yr == 2010)) 

To compute indices a model that can generate predictions of abundances across time and space is needed. For example, a tensor product of a temporal and a spatial smooth can be fit with mgcv:

gam_fit = gam(count ~ te(yr, lat, lon, d = c(1,2), k = c(8,30)), data = cuckoo, family = nb())

Prediction grid

A data frame containing the spatial locations as well as any variables needed for predicting from the model is now needed. If there are no spatio-temporal explanatory variables, as in the model above, the data frame may be defined from the combination of a data frame containing a spatial grid and a data frame containing temporal variables.

First define a spatial grid.

spat_grid = st_make_grid(cuckoo, cellsize = .5, what = "centers") |> st_sf() |>
  st_intersection(swe_map)
# Add columns with longitude and latitude
spat_grid[, c("lon", "lat")] = st_coordinates(spat_grid)
ggplot(swe_map) + geom_sf() + geom_sf(data = spat_grid)

The grid now contains at least one point for each county in the map, which will be important later when producing county-wise indices.

To create a spatio-temporal grid we combine the spatial grid with a temporal grid containing the one temporal variable yr.

pred_points = merge(data.frame(yr = 2000:2020), spat_grid, by = NULL)

This results in a prediction grid that is balanced across space and time, i.e. each time point occurs the same number of times in the grid, as does each spatial point.

National indices

The spatio-temporal grid can now be used to compute an index for the total Swedish population. The result is a data frame with an index for each year and corresponding uncertainty estimates.

swe_ind = index(gam_fit, timevar = "yr", newdata = pred_points)
head(swe_ind)
##     yr    index    ci_2.5     ci_10    ci_90  ci_97.5         se     se_log
## 1 2000 1.000000 1.0000000 1.0000000 1.000000 1.000000 0.00000000 0.00000000
## 2 2001 1.029516 0.9805629 0.9948450 1.056293 1.071458 0.02355728 0.02297323
## 3 2002 1.055186 0.9676306 0.9922504 1.108208 1.135164 0.04388539 0.04180741
## 4 2003 1.071392 0.9578370 0.9938381 1.140877 1.177611 0.05650488 0.05302884
## 5 2004 1.075826 0.9495374 0.9918479 1.152063 1.190805 0.06070837 0.05677877
## 6 2005 1.073879 0.9471606 0.9890586 1.151704 1.189762 0.06151820 0.05769944
##   baseline
## 1 808.4534
## 2 808.4534
## 3 808.4534
## 4 808.4534
## 5 808.4534
## 6 808.4534
swe_ind |> 
  ggplot(aes(x = yr, y = index)) + 
  geom_linerange(aes(ymin = ci_2.5, ymax = ci_97.5), col = "gray50") + 
  geom_linerange(aes(ymin = ci_10, ymax = ci_90), col = "red3", lwd =1) +
  geom_point() + ylim(.8,1.2)

Confidence intervals for the index are fairly wide. Part of the reason for this may be that it is hard to estimate population size in the first year when few sites were surveyed. One can instead use the mean abundance over a sequence of years as the baseline using the baseline argument to index:

index(gam_fit, timevar = "yr", newdata = pred_points, baseline = 2000:2010) |> 
  ggplot(aes(x = yr, y = index)) +   
  geom_linerange(aes(ymin = ci_2.5, ymax = ci_97.5), col = "gray50") + 
  geom_linerange(aes(ymin = ci_10, ymax = ci_90), col = "red3", lwd =1) +
  geom_point() + ylim(.8,1.2)

This results in lower uncertainty because the mean over 2000 - 2010 can be more accurately estimated.

Growth rates

Instead of estimating an index relative to a baseline year or period, the growth rate between consecutive years can be estimated by setting type = "delta".

index(gam_fit, timevar = "yr", newdata = pred_points, type = "delta") |> 
  ggplot(aes(x = yr, y = index)) +   
  geom_linerange(aes(ymin = ci_2.5, ymax = ci_97.5), col = "gray50") + 
  geom_linerange(aes(ymin = ci_10, ymax = ci_90), col = "red3", lwd =1) +
  geom_point() 

Note that the index for each year here corresponds to the estimated growth rate between that year and the previous year. E.g., the index for year 2005 is the change in abundance in year 2005 compared to 2004.

Indices by group

Using the byvar argument of index, indices within groups can be computed. This can be used for example to compute indices for each county:

index(gam_fit, timevar = "yr", byvar = "county", newdata = pred_points, baseline = 2000:2010) |> 
  ggplot(aes(x = yr, y = index, ymin = ci_2.5, ymax = ci_97.5)) +   
  geom_linerange(aes(ymin = ci_2.5, ymax = ci_97.5), col = "gray50") + 
  geom_linerange(aes(ymin = ci_10, ymax = ci_90), col = "red3", lwd = 1) +
  geom_point(size=.5) + 
  scale_x_continuous(breaks = c(2005,2015)) + facet_wrap(vars(county))

The indices above are scaled relative to the mean across 2000-2010 within each county. This means that information about relative abundance between counties is lost. To display such differences the argument type can be set to "global". The indices then are computed by aggregation of abundances within each county in the numerator, but aggregating across the whole grid in the denominator. The indices then represent the abundance in the counties relative to the total abundance in the baseline period.

index(gam_fit, timevar = "yr", byvar = "county", newdata = pred_points, baseline = 2000:2010, type = "global") |> 
  ggplot(aes(x = yr, y = index, ymin = ci_2.5, ymax = ci_97.5)) +   
  geom_linerange(aes(ymin = ci_2.5, ymax = ci_97.5), col = "gray50") + 
  geom_linerange(aes(ymin = ci_10, ymax = ci_90), col = "red3", lwd = 1) +
  geom_point(size=.5) + 
  scale_x_continuous(breaks = c(2005,2015)) + facet_wrap(vars(county))

The results here suggest that the northernmost county, Norrbotten, which is also the largest county, contains 20-30% of the Swedish population (with the period 2000-2010 as the reference).

The above indices clearly depend on the size of the county. It could be of interest to instead compare population density across counties and time. This can be done by supplying weights to the index function. Two sets of weights can be specified, one for the numerator (argument weights) and one for the denominator (argument bweights). For the numerator, we use the inverse of the number of spatial grid points within each county so that the summed weighted abundance across the county is the within county density. For the denominator we use the inverse of the total number of spatial grid points times the number time points as the weights, so that the reference is the averaged density across the whole of Sweden in the baseline period.

# Count number of predictions points per county 
pred_points = pred_points |> group_by(county, yr) |> mutate(ppc = n())
w = 1/pred_points$ppc

# Denominator weights
bw = 1/(nrow(spat_grid) * length(2000:2010))
  
index(gam_fit, timevar = "yr", byvar = "county", newdata = pred_points, weights = w, bweights = bw, baseline = 2000:2010, type = "global")  |> 
  ggplot(aes(x = yr, y = index, ymin = ci_2.5, ymax = ci_97.5)) +   
  geom_linerange(aes(ymin = ci_2.5, ymax = ci_97.5), col = "gray50") + 
  geom_linerange(aes(ymin = ci_10, ymax = ci_90), col = "red3", lwd = 1) +
  geom_point(size=.5) + 
  scale_x_continuous(breaks = c(2005,2015)) + facet_wrap(vars(county))

This suggests e.g. that the density in Kalmar is around 50% larger then the national density, and that the density in Västra Götaland is around half of the national density.