Grouped Hyper Data Frame

This vignette of package groupedHyperframe (CRAN, Github, RPubs) documents the creation of groupedHyperframe object, the batch processes for a groupedHyperframe, and aggregations of various statistics over multi-level grouping structure.

Prerequisite

Experimental (and maybe unstable) features are implemented extremely frequently on Github. Active developers should use the Github version; suggestions and bug reports are welcome!

Stable releases to CRAN are typically updated every 2 to 3 months, or when the authors have an upcoming manuscript in the peer-reviewing process. Developers should not use the CRAN version!

Package groupedHyperframe may require the development versions of the spatstat family.

remotes::install_github('spatstat/spatstat', upgrade = 'always')
remotes::install_github('spatstat/spatstat.data', upgrade = 'always')
remotes::install_github('spatstat/spatstat.explore', upgrade = 'always')
remotes::install_github('spatstat/spatstat.geom', upgrade = 'always')
remotes::install_github('spatstat/spatstat.linnet', upgrade = 'always')
remotes::install_github('spatstat/spatstat.model', upgrade = 'always')
remotes::install_github('spatstat/spatstat.random', upgrade = 'always')
remotes::install_github('spatstat/spatstat.sparse', upgrade = 'always')
remotes::install_github('spatstat/spatstat.univar', upgrade = 'always')
remotes::install_github('spatstat/spatstat.utils', upgrade = 'always')

Dependencies

Getting Started

Terms and Abbreviations

Acknowledgement

This work is supported by National Institutes of Health, U.S. Department of Health and Human Services grants

Grouped Hyper Data Frame

Term / Abbreviation	Description
`\|>`	Forward pipe operator introduced in `R` 4.1.0
`.Machine`	Numerical characteristics of the machine `R` is running on, e.g., 32-bit integers and IEC 60559 floating-point (double precision) arithmetic
`attr`, `attributes`	Attributes
`CRAN`, `R`	The Comprehensive R Archive Network
`cor`	Correlation matrix
`cor.spatial`	Tjøstheim’s nonparametric correlation coefficient, from package `SpatialPack` (Vallejos, Osorio, and Bevilacqua 2020)
`cov`, `cov2cor`	Variance-covariance matrix, and conversion to correlation matrix
`data.frame`	Data frame
`diag`	Matrix diagonals
`dist`	Distance matrix; to take advantage of `stats:::as.matrix.dist`
`file.size`	File size in bytes
`formula`	Formula
`fv`, `fv.object`, `plot.fv`	(Plot of) function value table
`groupedData`, `~ g1/.../gm`	Grouped data frame; nested grouping structure, from package `nlme` (Pinheiro, Bates, and R Core Team 2025)
`groupedHyperframe`	Grouped hyper data frame
`hypercolumns`, `hyperframe`	(Hyper columns of) hyper data frame, from package `spatstat.geom` (Baddeley and Turner 2005)
`inherits`	Class inheritance
`kerndens`	Kernel density, `stats::density.default()$y`
`Inf`	Positive infinity \infty
`kmeans`	k-means clustering (Hartigan and Wong 1979)
`list`, `listof`	Lists of objects
`markformat`	Storage mode of `marks`
`marks`, `marked`	Marks of a point pattern
`merge`	Merge two `data.frame`s
`mc.cores`	Number of CPU cores to use for parallel computing
`message`	Diagnostic message printed in `R` console
`multitype`	Multitype spatial object
`NaN`	Not-a-Number
`object.size`	Memory allocation
`pmean`, `pmedian`	Parallel, or point-wise, mean and median, `groupedHyperframe::pmean`; `groupedHyperframe::pmedian`
`pmax`, `pmin`	Parallel, or point-wise, maxima and minima
`ppp`, `ppp.object`	(Marked) point pattern
`quantile`	Quantile
`save`, `saveRDS`, `xz`	Save with `xz` compression
`S3`, `generic`, `methods`	`S3` object oriented system, `UseMethod`; `getS3method`; https://adv-r.hadley.nz/s3.html
`S4`, `generic`, `methods`	`S4` object oriented system, `isS4`; `setClass`; `setMethod`; `getMethod`; https://adv-r.hadley.nz/s4.html
`sd`	Standard deviation
`search`	Search path
`Surv`	Survival, i.e., time-to-event, object
`trapz`, `cumtrapz`	(Cumulative) trapezoidal integration, from package `pracma` (Borchers 2023)
`vector`	Vector

We introduce a new S3 class groupedHyperframe for grouped hyper data frame, which inherits from the hyper data frame hyperframe class from package spatstat.geom (Baddeley, Rubak, and Turner 2015; Baddeley and Turner 2005). A hyperframe contains columns either as vectors like in a data.frame, or as lists of objects of the same class, a.k.a, the hypercolumns. This data structure is particularly useful in spatial analysis, e.g., with medical images, where the spatial information in each image would be represented by one element in a hypercolumn. The derived class groupedHyperframe has additional attributes

The grammar of the nested grouping structure g_1/.../g_m (~g1/.../gm) follows that of the parameter random of functions nlme::lme() and nlme::nlme(). In fact, the 'grouped' extension of a hyperframe is inspired by the nlme::groupedData class which inherits from data.frame (Pinheiro, Bates, and R Core Team 2025).

In this section, we introduce several S3 method dispatches of the S3 generic as.groupedHyperframe() to convert various classes into a groupedHyperframe. We also introduce aggregation functions aggregate_*() of the hypercolumns in a groupedHyperframe, at either one of the nested grouping levels g_1,\cdots,g_{m-1}. Aggregation at the lowest grouping level g_m is ignored, i.e., no aggregation to be performed. Available aggregation methods are the parallel minima base::pmin(), parallel maxima base::pmax(), parallel means pmean() (default) and parallel medians pmedian().

From data.frame

User may convert a data.frame with substantial amount of duplicated information into a groupedHyperframe using the S3 method dispatch as.groupedHyperframe.data.frame(). This function

In the following example, consider a toy data set wrobel_lung0 with non-identical column hladr in the lowest group image_id of the nested grouping structure ~patient_id/image_id.

wrobel_lung0 |> head()
#>            image_id    patient_id gender stage_numeric pack_years
#> 1 [40864,18015].im3 #01 0-889-121      F             1         60
#> 2 [40864,18015].im3 #01 0-889-121      F             1         60
#> 3 [40864,18015].im3 #01 0-889-121      F             1         60
#> 4 [40864,18015].im3 #01 0-889-121      F             1         60
#> 5 [40864,18015].im3 #01 0-889-121      F             1         60
#> 6 [40864,18015].im3 #01 0-889-121      F             1         60
#>   adjuvant_therapy hladr    OS mhcII stage age
#> 1               No 0.115 3488+   low    IA  85
#> 2               No 0.239 3488+   low    IA  85
#> 3               No 0.268 3488+   low    IA  85
#> 4               No 0.245 3488+   low    IA  85
#> 5               No 0.127 3488+   low    IA  85
#> 6               No 0.136 3488+   low    IA  85

By converting wrobel_lung0 into a groupedHyperframe, the numeric hladr from each ~patient_id/image_id are converted into elements of the numeric-hypercolumn hladr in the returned wrobel_lung0g. Each row of a groupedHyperframe represents the lowest group of the nested grouping structure. The R console output (S3 method dispatch print.groupedHyperframe()) highlights the nested grouping structure, number of clusters at each grouping level, as well as the first 10 (or less) rows of the groupedHyperframe.

(wrobel_lung0g = wrobel_lung0 |> as.groupedHyperframe(group = ~ patient_id/image_id))
#> Grouped Hyperframe: ~patient_id/image_id
#> 
#> 15 image_id nested in
#> 3 patient_id
#> 
#> Preview of first 10 (or less) rows:
#> 
#>        hladr          image_id    patient_id gender stage_numeric pack_years
#> 1  (numeric) [40864,18015].im3 #01 0-889-121      F             1         60
#> 2  (numeric) [42689,19214].im3 #01 0-889-121      F             1         60
#> 3  (numeric) [42806,16718].im3 #01 0-889-121      F             1         60
#> 4  (numeric) [44311,17766].im3 #01 0-889-121      F             1         60
#> 5  (numeric) [45366,16647].im3 #01 0-889-121      F             1         60
#> 6  (numeric) [56576,16907].im3 #02 1-037-393      M             1         30
#> 7  (numeric) [56583,15235].im3 #02 1-037-393      M             1         30
#> 8  (numeric) [57130,16082].im3 #02 1-037-393      M             1         30
#> 9  (numeric) [57396,17896].im3 #02 1-037-393      M             1         30
#> 10 (numeric) [57403,16934].im3 #02 1-037-393      M             1         30
#>    adjuvant_therapy    OS mhcII stage age
#> 1                No 3488+   low    IA  85
#> 2                No 3488+   low    IA  85
#> 3                No 3488+   low    IA  85
#> 4                No 3488+   low    IA  85
#> 5                No 3488+   low    IA  85
#> 6                No  1605   low    IA  66
#> 7                No  1605   low    IA  66
#> 8                No  1605   low    IA  66
#> 9                No  1605   low    IA  66
#> 10               No  1605   low    IA  66

Reducing memory allocation

Converting a data.frame with substantial amount of duplicated information like wrobel_lung0 into a groupedHyperframe greatly reduces the memory allocation.

A groupedHyperframe, however, would not reduce much the saved file.size compared to a data.frame, if xz compression is used for both.

Aggregation of numeric-hypercolumn

We use function aggregate_quantile() to aggregate the quantiles of each element in the numeric-hypercolumn hladr in wrobel_lung0g by point-wise means (default of parameter f_aggr_) at the second-lowest group ~patient_id. The returned object is a hyperframe instead of a groupedHyperframe, as we have one aggregated hladr.quantile per ~patient_id, thus eliminates the need for a grouping structure.

In this package, we have include a groupedHyperframe example Ki67 with a numeric-hypercolumn logKi67 and a nested grouping structure ~patientID/tissueID.

data(Ki67, package = 'groupedHyperframe')
Ki67
#> Grouped Hyperframe: ~patientID/tissueID
#> 
#> 645 tissueID nested in
#> 622 patientID
#> 
#> Preview of first 10 (or less) rows:
#> 
#>      logKi67 tissueID Tstage  PFS recfreesurv_mon recurrence adj_rad adj_chemo
#> 1  (numeric) TJUe_I17      2 100+             100          0   FALSE     FALSE
#> 2  (numeric) TJUe_G17      1   22              22          1   FALSE     FALSE
#> 3  (numeric) TJUe_F17      1  99+              99          0   FALSE        NA
#> 4  (numeric) TJUe_D17      1  99+              99          0   FALSE      TRUE
#> 5  (numeric) TJUe_J18      1  112             112          1    TRUE      TRUE
#> 6  (numeric) TJUe_N17      4   12              12          1    TRUE     FALSE
#> 7  (numeric) TJUe_J17      2  64+              64          0   FALSE     FALSE
#> 8  (numeric) TJUe_F19      2  56+              56          0   FALSE     FALSE
#> 9  (numeric) TJUe_P19      2  79+              79          0   FALSE     FALSE
#> 10 (numeric) TJUe_O19      2   26              26          1   FALSE      TRUE
#>    histology  Her2   HR  node  race age patientID
#> 1          3  TRUE TRUE  TRUE White  66   PT00037
#> 2          3 FALSE TRUE FALSE Black  42   PT00039
#> 3          3 FALSE TRUE FALSE White  60   PT00040
#> 4          3 FALSE TRUE  TRUE White  53   PT00042
#> 5          3 FALSE TRUE  TRUE White  52   PT00054
#> 6          2  TRUE TRUE  TRUE Black  51   PT00059
#> 7          3 FALSE TRUE  TRUE Asian  50   PT00062
#> 8          2  TRUE TRUE  TRUE White  37   PT00068
#> 9          3  TRUE TRUE FALSE White  68   PT00082
#> 10         2  TRUE TRUE FALSE Black  55   PT00084

Similarly, we use function aggregate_quantile() to aggregate the quantiles of each element in the numeric-hypercolumn logKi67 at the second-lowest group ~patientID.

s = Ki67 |>
  aggregate_quantile(by = ~ patientID, probs = seq.int(from = .01, to = .99, by = .01))
s |> head()
#> Hyperframe:
#>   Tstage  PFS recfreesurv_mon recurrence adj_rad adj_chemo histology  Her2   HR
#> 1      2 100+             100          0   FALSE     FALSE         3  TRUE TRUE
#> 2      1   22              22          1   FALSE     FALSE         3 FALSE TRUE
#> 3      1  99+              99          0   FALSE        NA         3 FALSE TRUE
#> 4      1  99+              99          0   FALSE      TRUE         3 FALSE TRUE
#> 5      1  112             112          1    TRUE      TRUE         3 FALSE TRUE
#> 6      4   12              12          1    TRUE     FALSE         2  TRUE TRUE
#>    node  race age patientID logKi67.quantile
#> 1  TRUE White  66   PT00037        (numeric)
#> 2 FALSE Black  42   PT00039        (numeric)
#> 3 FALSE White  60   PT00040        (numeric)
#> 4  TRUE White  53   PT00042        (numeric)
#> 5  TRUE White  52   PT00054        (numeric)
#> 6  TRUE Black  51   PT00059        (numeric)

Users are encouraged to learn more about the applications of the aggregated quantiles of Ki67 data from package hyper.gam vignettes (RPubs, CRAN), section Quantile Index, as well as from our peer-reviewed publications Yi et al. (2025); Yi et al. (2023b); Yi et al. (2023a).

From hyperframe

Users may convert a hyperframe provided in the package spatstat.data into a groupedHyperframe using the S3 method dispatch as.groupedHyperframe.hyperframe(). This function simply inspects and adds a (nested) grouping structure to the input hyperframe.

In the following example, we inspect the data set spatstat.data::osteo, which has the serial number of sampling volume brick nested in the bone sample id, and add the nested grouping structure ~id/brick to it.

In this vignette, we do not place much emphasize on the objects provided in package spatstat.data, for now.

Grouping Structure on ppp-Hypercolumn

In this section, we introduce the creation of groupedHyperframe with one-and-only-one point pattern (ppp) hypercolumn, as well as the batch processes of spatial point pattern analyses on the one-and-only-one ppp-hypercolumn of a hyperframe (and/or groupedHyperframe).

These batch processes are not intended for a hyperframe (and/or groupedHyperframe) with multiple ppp-hypercolumn in the foreseeable future, as that would require checking for name clashes in the marks from multiple ppp-hypercolumn.

Grouped hyper data frame with ppp-hypercolumn

Function grouped_ppp() creates a groupedHyperframe with one-and-only-one ppp-hypercolumn.

(s = wrobel_lung |>
   grouped_ppp(formula = hladr + phenotype ~ OS + gender + age | patient_id/image_id))
#> Grouped Hyperframe: ~patient_id/image_id
#> 
#> 15 image_id nested in
#> 3 patient_id
#> 
#> Preview of first 10 (or less) rows:
#> 
#>       OS gender age    patient_id          image_id  ppp.
#> 1  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp)
#> 2  3488+      F  85 #01 0-889-121 [42689,19214].im3 (ppp)
#> 3  3488+      F  85 #01 0-889-121 [42806,16718].im3 (ppp)
#> 4  3488+      F  85 #01 0-889-121 [44311,17766].im3 (ppp)
#> 5  3488+      F  85 #01 0-889-121 [45366,16647].im3 (ppp)
#> 6   1605      M  66 #02 1-037-393 [56576,16907].im3 (ppp)
#> 7   1605      M  66 #02 1-037-393 [56583,15235].im3 (ppp)
#> 8   1605      M  66 #02 1-037-393 [57130,16082].im3 (ppp)
#> 9   1605      M  66 #02 1-037-393 [57396,17896].im3 (ppp)
#> 10  1605      M  66 #02 1-037-393 [57403,16934].im3 (ppp)

Batch processes

Batch processes to return fv-hypercolumn

In this section, we discuss the batch processes that return a function value table (fv) hypercolumn, i.e., a hypercolumn which consists of a list of fv.objects.

Batch processes applicable to numeric marks

Exception handling

Batch Process	Workhorse in package `spatstat.explore`	`fv`-hypercolumns Suffix
`Emark_()` and `Vmark_()`	`Emark` and `Vmark`, conditional mean E(r) and conditional variance V(r), diagnostics for dependence between the points and the marks (Schlather, Ribeiro, and Diggle 2003)	`.E` and `.V`
`markcorr_()`	`markcorr`, marked correlation k_{mm}(r) or generalized mark correlation k_f(r) (Stoyan and Stoyan 1994)	`.k`
`markvario_()`	`markvario`, mark variogram \gamma(r) (Wälder and Stoyan 1996)	`.gamma`
`Kmark_()`	`Kmark`, mark-weighted K_f(r) function (Penttinen, Stoyan, and Henttonen 1992)	`.K`

In package spatstat.explore (up to version 3.4.3, 2025-05-21), function markcorr() is the workhorse inside functions Emark(), Vmark() and markvario(). Function markcorr() relies on the un-exported workhorse function spatstat.explore:::sewsmod(), whose default method = "density" contains the calculation of the ratio of two kernel densities. Due to the floating-point precision of R, such density ratios may have exceptional returns of

Function markcorr() provides a default argument of parameter r, at which the mark correlation function k_f(r) are evaluated, using function spatstat.geom::handle.r.b.args(). The S3 method dispatch spatstat.explore::print.fv() prints the recommended range and available range of the argument r.

spatstat.data::spruces |> 
  spatstat.explore::markcorr()
#> Function value object (class 'fv')
#> for the function r -> k[mm](r)
#> ................................................................................
#>       Math.label             
#> r     r                      
#> theo  {k[mm]^{iid}}(r)       
#> trans {hat(k)[mm]^{trans}}(r)
#> iso   {hat(k)[mm]^{iso}}(r)  
#>       Description                                       
#> r     distance argument r                               
#> theo  theoretical value (independent marks) for k[mm](r)
#> trans translation-corrected estimate of k[mm](r)        
#> iso   Ripley isotropic correction estimate of k[mm](r)  
#> ................................................................................
#> Default plot formula:  .~r
#> where "." stands for 'iso', 'trans', 'theo'
#> Recommended range of argument r: [0, 9.5]
#> Available range of argument r: [0, 9.5]
#> Unit of length: 1 metre

We may observe exceptional returns if we go beyond the recommended range and/or available range. In the following example, we see that the mark correlation k_f(r) (column iso) having value NaN at r=81,88,89,90, value 0 at r=82 and value Inf at r=83,84,87.

The batch process markcorr_(), as well as Emark_(), Vmark_() and markvario_() which rely on the workhorse markcorr(), prints the Recommended r_\text{max} from each of the fv-returns, no matter a user-specified argument for parameter r is provided or not.

When a user-specified r is provided for a batch process on all ppp.objects in the ppp-hypercolumn, inevitably some of the fv-returns may have exceptional values. We discuss this exception handling in the next section Aggregation over nested grouping structure.

Batch processes applicable to multitype marks

Batch processes to return numeric-hypercolumn

Batch processes in a pipeline

Batch Process	Workhorse in package `spatstat.explore`	`fv`-hypercolumns Suffix
`Gcross_()`	`Gcross`, multitype nearest-neighbour distance G_{ij}(r)	`.G`
`Kcross_()`	`Kcross`, multitype K_{ij}(r)	`.K`
`Jcross_()`	`Jcross`, multitype J_{ij}(r) (Van Lieshout and Baddeley 1999)	`.J`
`Lcross_()`	`Lcross`, multitype L_{ij}(r)=\sqrt{\frac{K_{ij}(r)}{\pi}}	`.L`

Batch Process	Workhorse in package `spatstat.geom`	Applicable to	numeric-hypercolumns Suffix
`nncross_()`	`nncross.ppp(., what = 'dist')`, nearest neighbour distance	`multitype` marks	`.nncross`

Multiple batch processes may be applied to a hyperframe (and/or groupedHyperframe) in a pipeline using the native pipe operator |> introduced since R 4.1.0.

r = seq.int(from = 0, to = 250, by = 10)
out = s |>
  Emark_(r = r, correction = 'none') |> # slow
  # Vmark_(r = r, correction = 'none') |> # slow
  # markcorr_(r = r, correction = 'none') |> # slow
  # markvario_(r = r, correction = 'none') |> # slow
  # Kmark_(r = r, correction = 'none') |> # fast
  Gcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'none') |> # fast
  # Kcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'none') |> # fast
  nncross_(i = 'CK+.CD8-', j = 'CK-.CD8+', correction = 'none') # fast
#> 
#> Recommended rmax for hladr.E are 15⨯ 125.625
#>

out
#> Grouped Hyperframe: ~patient_id/image_id
#> 
#> 15 image_id nested in
#> 3 patient_id
#> 
#> Preview of first 10 (or less) rows:
#> 
#>       OS gender age    patient_id          image_id  ppp. hladr.E phenotype.G
#> 1  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp)    (fv)        (fv)
#> 2  3488+      F  85 #01 0-889-121 [42689,19214].im3 (ppp)    (fv)        (fv)
#> 3  3488+      F  85 #01 0-889-121 [42806,16718].im3 (ppp)    (fv)        (fv)
#> 4  3488+      F  85 #01 0-889-121 [44311,17766].im3 (ppp)    (fv)        (fv)
#> 5  3488+      F  85 #01 0-889-121 [45366,16647].im3 (ppp)    (fv)        (fv)
#> 6   1605      M  66 #02 1-037-393 [56576,16907].im3 (ppp)    (fv)        (fv)
#> 7   1605      M  66 #02 1-037-393 [56583,15235].im3 (ppp)    (fv)        (fv)
#> 8   1605      M  66 #02 1-037-393 [57130,16082].im3 (ppp)    (fv)        (fv)
#> 9   1605      M  66 #02 1-037-393 [57396,17896].im3 (ppp)    (fv)        (fv)
#> 10  1605      M  66 #02 1-037-393 [57403,16934].im3 (ppp)    (fv)        (fv)
#>    phenotype.nncross
#> 1          (numeric)
#> 2          (numeric)
#> 3          (numeric)
#> 4          (numeric)
#> 5          (numeric)
#> 6          (numeric)
#> 7          (numeric)
#> 8          (numeric)
#> 9          (numeric)
#> 10         (numeric)

Aggregation over nested grouping structure

Of fv-hypercolumn(s)

Each of the numeric-hypercolumns contains tabulated values on the common grid of r. One “slice” of this grid may be extracted by

Exception handling

As we have mentioned in the previous section Batch processes, a same user-specified argument of r will be used for all ppp.objects in the ppp-hypercolumn. Suppose a naive user uses an r-vector well beyond the recommended range and/or available range. In this case, function aggregate_fv() prints a message of Legal r_\text{max}, which is determined by the last value of r, that no value of NaN and/or Inf appears in any of the fv-returns, e.g., in the hypercolumn hladr.E. Note that the 0-values in an fv-return are typically a sign of degeneration as well, but function aggregate_fv() does not eliminate 0-values from the determination of legal r_\text{max}.

Of numeric-hypercolumn and numeric marks in ppp-hypercolumn

On quantiles

Function aggregate_quantile() aggregates the quantiles of the numeric-hypercolumns and the numeric marks in the ppp-hypercolumn.

On kernel densities

Function aggregate_kerndens() aggregates the kernel density of the numeric-hypercolumns and the numeric marks in the ppp-hypercolumn.

Appendix A: Minor Functionalities

k-Means Clustering

The S3 generic .kmeans() performs k-means clustering (workhorse function stats::kmeans).

On ppp.object

The S3 method dispatch .kmeans.ppp() performs k-means clustering, with paramters

By coordinate(s) and/or marks

Example below shows a clustering based on the x- and y-coordinates, as well as the numeric mark Mag.

By clusterSize

Split by k-Means Clustering

The S3 generic split_kmeans() splits ppp.object, listof ppp.objects, and hyperframe by k-means clustering.

Note that many functions in package groupedHyperframe require a 'dataframe' markformat for ppp.objects.

User may convert a 'vector' markformat to 'dataframe' using the syntactic sugar `mark_name<-`,

Split a ppp.object

The S3 method dispatch split_kmeans.default() splits a ppp.object by k-means clustering.

Split a listof ppp.objects

The S3 method dispatch split_kmeans.listof() splits a listof ppp.objects by k-means clustering.

flu$pattern[1:2] |> split_kmeans(formula = ~ x + y, centers = 3L) 
#> $`wt M2-M1 13.1`
#> Marked planar point pattern: 157 points
#> Multitype, with levels = M2, M1 
#> window: rectangle = [0, 3331] x [0, 3331] nm
#> 
#> $`wt M2-M1 13.2`
#> Marked planar point pattern: 169 points
#> Multitype, with levels = M2, M1 
#> window: rectangle = [0, 3331] x [0, 3331] nm
#> 
#> $`wt M2-M1 13.3`
#> Marked planar point pattern: 145 points
#> Multitype, with levels = M2, M1 
#> window: rectangle = [0, 3331] x [0, 3331] nm
#> 
#> $`wt M2-M1 22.1`
#> Marked planar point pattern: 77 points
#> Multitype, with levels = M2, M1 
#> window: rectangle = [0, 3331] x [0, 3331] nm
#> 
#> $`wt M2-M1 22.2`
#> Marked planar point pattern: 57 points
#> Multitype, with levels = M2, M1 
#> window: rectangle = [0, 3331] x [0, 3331] nm
#> 
#> $`wt M2-M1 22.3`
#> Marked planar point pattern: 83 points
#> Multitype, with levels = M2, M1 
#> window: rectangle = [0, 3331] x [0, 3331] nm
#> 
#> attr(,"id")
#> [1] 1 1 1 2 2 2
#> attr(,"cluster")
#> [1] 1 2 3 1 2 3

Split a hyperframe and/or groupedHyperframe

The S3 method dispatch split_kmeans.hyperframe() splits a hyperframe and/or groupedHyperframe by k-means clustering of the one-and-only-one ppp-hypercolumn.

Pairwise Tjøstheim’s Coefficient

The S3 generic pairwise_cor_spatial() calculates the nonparametric, rank-based, Tjøstheim’s correlation coefficients (Tjøstheim 1978; Hubert and Golledge 1982) in a pairwise-combination fashion, using the workhorse function SpatialPack::cor.spatial(). All S3 method dispatches return a object of class 'pairwise_cor_spatial', which inherits from class 'dist'.

Of ppp.object

The S3 method dispatch pairwise_cor_spatial.ppp() finds the nonparametric Tjøstheim’s correlation coefficients from the pairwise-combinations of all numeric marks of a ppp.object.

The printing of 'pairwise_cor_spatial' is taken care of by function stats:::print.dist.

Matrix of pairwise Tjøstheim’s coefficient

The S3 method dispatch as.matrix.pairwise_cor_spatial() returns a matrix with diagonal values of 1.

Note that this matrix is not a correlation matrix, because Tjøstheim’s correlation coefficient

Appendix B: What We Don’t Do

S4 method dispatch merge

The authors plan not to implement an S4 method dispatch merge (e.g., S3 method dispatch base::merge.data.frame()) for hyperframe and/or groupedHyperframe classes, for several reasons.

The authors suggest users do base::merge.data.frame() first, then do as.groupedHyperframe() as a workaround.

References

Baddeley, Adrian, Ege Rubak, and Rolf Turner. 2015. Spatial Point Patterns: Methodology and Applications with R. London: Chapman; Hall/CRC Press. https://www.routledge.com/Spatial-Point-Patterns-Methodology-and-Applications-with-R/Baddeley-Rubak-Turner/p/book/9781482210200/.

Baddeley, Adrian, and Rolf Turner. 2005. “spatstat: An R Package for Analyzing Spatial Point Patterns.” Journal of Statistical Software 12 (6): 1–42. https://doi.org/10.18637/jss.v012.i06.

Bengtsson, Henrik. 2025. matrixStats: Functions That Apply to Rows and Columns of Matrices (and to Vectors). https://doi.org/10.32614/CRAN.package.matrixStats.

Borchers, Hans W. 2023. pracma: Practical Numerical Math Functions. https://doi.org/10.32614/CRAN.package.pracma.

Csárdi, Gábor. 2025. cli: Helpers for Developing Command Line Interfaces. https://doi.org/10.32614/CRAN.package.cli.

Hartigan, J. A., and M. A. Wong. 1979. “A K-Means Clustering Algorithm.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 28 (1): 100–108. https://doi.org/10.2307/2346830.

Hubert, Lawrence J., and Reginald G. Golledge. 1982. “Measuring Association Between Spatially Defined Variables: Tjøstheim’s Index and Some Extensions.” Geographical Analysis 14 (3): 273–78. https://doi.org/10.1111/j.1538-4632.1982.tb00077.x.

Penttinen, Antti, Dietrich Stoyan, and Helena M. Henttonen. 1992. “Marked Point Processes in Forest Statistics.” Forest Science 38 (4): 806–24. https://doi.org/10.1093/forestscience/38.4.806.

Pinheiro, José, Douglas Bates, and R Core Team. 2025. nlme: Linear and Nonlinear Mixed Effects Models. https://doi.org/10.32614/CRAN.package.nlme.

Schlather, Martin, Jr Ribeiro Paulo J., and Peter J. Diggle. 2003. “Detecting Dependence Between Marks and Locations of Marked Point Processes.” Journal of the Royal Statistical Society Series B: Statistical Methodology 66 (1): 79–93. https://doi.org/10.1046/j.1369-7412.2003.05343.x.

Stoyan, Helga, and Dietrich Stoyan. 1994. Fractals, Random Shapes and Point Fields: Methods of Geometrical Statistics. John Wiley; Sons. https://www.wiley.com/Fractals%2C+Random+Shapes+and+Point+Fields%3A+Methods+of+Geometrical+Statistics-p-9780471937579.

Tjøstheim, Dag. 1978. “A Measure of Association for Spatial Variables.” Biometrika 65 (1): 109–14. https://doi.org/10.1093/biomet/65.1.109.

Vallejos, R., F. Osorio, and M. Bevilacqua. 2020. Spatial Relationships Between Two Georeferenced Variables: With Applications in r. New York: Springer. http://srb2gv.mat.utfsm.cl/.

Van Lieshout, M. N. M., and A. J. Baddeley. 1999. “Indices of Dependence Between Types in Multivariate Point Patterns.” Scandinavian Journal of Statistics 26 (4): 511–32. https://doi.org/10.1111/1467-9469.00165.

Wälder, Olga, and Dietrich Stoyan. 1996. “On Variograms in Point Process Statistics.” Biometrical Journal 38 (8): 895–905. https://doi.org/10.1002/bimj.4710380802.

Yi, Misung, Tingting Zhan, Amy R. Peck, Jeffrey A. Hooke, Albert J. Kovatich, Craig D. Shriver, Hai Hu, Yunguang Sun, Hallgeir Rui, and Inna Chervoneva. 2023a. “Quantile Index Biomarkers Based on Single-Cell Expression Data.” Laboratory Investigation 103 (8): 100158. https://doi.org/10.1016/j.labinv.2023.100158.

———. 2023b. “Selection of Optimal Quantile Protein Biomarkers Based on Cell-Level Immunohistochemistry Data.” BMC Bioinformatics 24 (1): 298. https://doi.org/10.1186/s12859-023-05408-8.

Yi, Misung, Tingting Zhan, Hallgeir Rui, and Inna Chervoneva. 2025. “Functional Protein Biomarkers Based on Distributions of Expression Levels in Single-Cell Imaging Data.” Bioinformatics, April, btaf182. https://doi.org/10.1093/bioinformatics/btaf182.

Introduction

Prerequisite

Dependencies

Getting Started

Terms and Abbreviations

Acknowledgement

Grouped Hyper Data Frame

From data.frame

Reducing memory allocation

Aggregation of numeric-hypercolumn

From hyperframe

Grouping Structure on ppp-Hypercolumn