This package adds resampling methods for the {mlr3} package framework suited for spatial, temporal and spatiotemporal data. These methods can help to reduce the influence of autocorrelation on performance estimates when performing cross-validation. While this article gives a rather technical introduction to the package, a more applied approach can be found in the mlr3book section on “Spatiotemporal Analysis”.
After loading the package via
library("mlr3spatiotempcv")
, the spatiotemporal resampling
methods and example tasks provided by {mlr3spatiotempcv} are available
to the user alongside the default {mlr3} resampling methods and
tasks.
To make use of spatial resampling methods, a {mlr3} task that is
aware of its spatial characteristic needs to be created. Two
Task
child classes exist in {mlr3spatiotempcv} for this
purpose:
TaskClassifST
TaskRegrST
To create one of these, you have multiple options:
Task
directly via
$new()
- this only works for data.table backends (!)as_task_*
converters (e.g. if your data is
stored in an sf
object)We recommend the latter, as the as_task_*
converters aim
to make task construction easier, e.g., by creating the
DataBackend
(which is required to create a Task in {mlr3})
automatically and setting the crs
and
coordinate_names
fields. Let’s assume your (point) data is
stored in with an sf
object, which is a common scenario for
spatial analysis in R.
# create 'sf' object
data_sf = sf::st_as_sf(ecuador, coords = c("x", "y"), crs = 32717)
# create `TaskClassifST` from `sf` object
task = as_task_classif_st(data_sf, id = "ecuador_task", target = "slides", positive = "TRUE")
You can also use a plain data.frame
. In this case,
crs
and coordinate_names
need to be passed
along explicitly as they cannot be inferred directly from the
sf
object:
task = as_task_classif_st(ecuador, id = "ecuador_task", target = "slides",
positive = "TRUE", coordinate_names = c("x", "y"), crs = 32717)
The *ST
task family prints a subset of the coordinates
by default:
print(task)
#> <TaskClassifST:ecuador_task> (751 x 11)
#> * Target: slides
#> * Properties: twoclass
#> * Features (10):
#> - dbl (10): carea, cslope, dem, distdeforest, distroad,
#> distslidespast, hcurv, log.carea, slope, vcurv
#> * Coordinates:
#> x y
#> <num> <num>
#> 1: 712882.5 9560002
#> 2: 715232.5 9559582
#> 3: 715392.5 9560172
#> 4: 715042.5 9559312
#> 5: 715382.5 9560142
#> ---
#> 747: 714472.5 9558482
#> 748: 713142.5 9560992
#> 749: 713322.5 9560562
#> 750: 715392.5 9557932
#> 751: 713802.5 9560862
All *ST
tasks can be treated as their super class
equivalents TaskClassif
or TaskRegr
in
subsequent {mlr3} modeling steps.
In {mlr3}, dictionaries are used for overview purposes of available methods. The following sections show which dictionaries get appended with new entries when loading {mlr3spatiotempcv}.
TaskClassifST
TaskRegrST
mlr_reflections$task_types
#> Key: <type>
#> type package task learner
#> <char> <char> <char> <char>
#> 1: classif mlr3 TaskClassif LearnerClassif
#> 2: classif_st mlr3spatiotempcv TaskClassifST LearnerClassif
#> 3: regr mlr3 TaskRegr LearnerRegr
#> 4: regr_st mlr3spatiotempcv TaskRegrST LearnerRegr
#> 5: unsupervised mlr3 TaskUnsupervised Learner
#> prediction prediction_data measure
#> <char> <char> <char>
#> 1: PredictionClassif PredictionDataClassif MeasureClassif
#> 2: PredictionClassif PredictionDataClassif MeasureClassif
#> 3: PredictionRegr PredictionDataRegr MeasureRegr
#> 4: PredictionRegr PredictionDataRegr MeasureRegr
#> 5: <NA> <NA> <NA>
coordinate
space
time
mlr_reflections$task_col_roles
#> $regr
#> [1] "feature" "target" "name" "order" "stratum" "group" "weight"
#>
#> $classif
#> [1] "feature" "target" "name" "order" "stratum" "group" "weight"
#>
#> $unsupervised
#> [1] "feature" "name" "order"
#>
#> $classif_st
#> [1] "feature" "target" "name" "order" "stratum"
#> [6] "group" "weight" "coordinate" "space" "time"
#>
#> $regr_st
#> [1] "feature" "target" "name" "order" "stratum"
#> [6] "group" "weight" "coordinate" "space" "time"
mlr_resampling_spcv_block
mlr_resampling_spcv_buffer
mlr_resampling_spcv_coords
mlr_resampling_spcv_knndm
mlr_resampling_spcv_disc
mlr_resampling_spcv_tiles
mlr_resampling_spcv_env
mlr_resampling_sptcv_cstf
and their respective repeated versions. See
as.data.table(mlr_resamplings)
for the full dictionary.
tsk("ecuador")
(spatial, classif)
tsk("cookfarm_mlr3")
(spatiotemp, regr)
The following table lists all spatiotemporal methods implemented in
{mlr3spatiotempcv} (or {mlr3}), their upstream R package and scientific
references. All methods besides "spcv_buffer"
also have a
corresponding “repeated” method.
Category | (Package) Method Name | Reference | mlr3 Notation |
---|---|---|---|
Buffering, spatial | (blockCV) Spatial Buffering | Valavi et al. (2018) | mlr_resamplings_spcv_buffer |
Buffering, spatial | (sperrorest) Spatial Disc | Brenning (2012) | mlr_resamplings_spcv_disc |
Blocking, spatial | (blockCV) Spatial Blocking | Valavi et al. (2018) | mlr_resamplings_spcv_block |
Blocking, spatial | (sperrorest) Spatial Tiles | Valavi et al. (2018) | mlr_resamplings_spcv_tiles |
Clustering, spatial | (sperrorest) Spatial CV | Brenning (2012) | mlr_resamplings_spcv_coords |
Clustering, spatial | (CAST) KNNDM | Linnenbrink et al. (2023) | mlr_resamplings_spcv_knndm |
Clustering, feature-space | (blockCV) Environmental Blocking | Valavi et al. (2018) | mlr_resamplings_spcv_env |
Grouping, predefined inds | (mlr3) Predefined partitions | mlr_resamplings_custom_cv |
|
Grouping, spatiotemporal | (mlr3) via col_roles "group" |
mlr_resamplings_cv ,
Task$set_col_roles(<variable>, "group") |
|
Grouping, spatiotemporal | (CAST) Leave-Location-and-Time-Out | Meyer et al. (2018) | mlr_resamplings_sptcv_cstf ,
Task$set_col_roles(<variable>, "space|time") |