::install_github('tingtingzhan/groupedHyperframe') remotes
2025-06-05
This vignette of package groupedHyperframe
(CRAN
, Github, RPubs) documents the creation of groupedHyperframe
object, the batch processes for a groupedHyperframe
, and aggregations of various statistics over multi-level grouping structure.
Experimental (and maybe unstable) features are implemented extremely frequently on Github. Active developers should use the Github version; suggestions and bug reports are welcome!
::install_github('tingtingzhan/groupedHyperframe') remotes
Stable releases to CRAN
are typically updated every 2 to 3 months, or when the authors have an upcoming manuscript in the peer-reviewing process. Developers should not use the CRAN
version!
::install.packages('groupedHyperframe') # Developers, do NOT use!! utils
Package groupedHyperframe
may require the development versions of the spatstat
family.
::install_github('spatstat/spatstat')
remotes::install_github('spatstat/spatstat.data')
remotes::install_github('spatstat/spatstat.explore')
remotes::install_github('spatstat/spatstat.geom')
remotes::install_github('spatstat/spatstat.linnet')
remotes::install_github('spatstat/spatstat.model')
remotes::install_github('spatstat/spatstat.random')
remotes::install_github('spatstat/spatstat.sparse')
remotes::install_github('spatstat/spatstat.univar')
remotes::install_github('spatstat/spatstat.utils') remotes
Examples in this vignette require that the search
path has
library(groupedHyperframe)
library(survival) # to help hyperframe understand Surv object
Term / Abbreviation | Description |
---|---|
|> |
Forward pipe operator introduced in R 4.1.0 |
.Machine |
Numerical characteristics of the machine R is running on, e.g., 32-bit integers and IEC 60559 floating-point (double precision) arithmetic |
attr , attributes |
Attributes |
CRAN , R |
The Comprehensive R Archive Network |
cor |
Correlation matrix |
cor.spatial |
Tjøstheim’s nonparametric correlation coefficient, from package SpatialPack (Vallejos, Osorio, and Bevilacqua 2020) |
cov , cov2cor |
Variance-covariance matrix, and conversion to correlation matrix |
data.frame |
Data frame |
diag |
Matrix diagonals |
dist |
Distance matrix; to take advantage of stats:::as.matrix.dist |
file.size |
File size in bytes |
formula |
Formula |
fv , fv.object , plot.fv |
(Plot of) function value table |
groupedData , ~ g1/.../gm |
Grouped data frame; nested grouping structure, from package nlme (Pinheiro, Bates, and R Core Team 2025) |
groupedHyperframe |
Grouped hyper data frame |
hypercolumns , hyperframe |
(Hyper columns of) hyper data frame, from package spatstat.geom (Baddeley and Turner 2005) |
inherits |
Class inheritance |
kerndens |
Kernel density, stats::density.default()$y |
Inf |
Positive infinity \infty |
kmeans |
k-means clustering |
list , listof |
Lists of objects |
markformat |
Storage mode of marks |
marks , marked |
Marks of a point pattern |
mc.cores |
Number of CPU cores to use for parallel computing |
message |
Diagnostic message printed in R console |
multitype |
Multitype spatial object |
NaN |
Not-a-Number |
object.size |
Memory allocation |
pmean , pmedian |
Parallel, or point-wise, mean and median, groupedHyperframe::pmean ; groupedHyperframe::pmedian |
pmax , pmin |
Parallel, or point-wise, maxima and minima |
ppp , ppp.object |
(Marked) point pattern |
quantile |
Quantile |
save , saveRDS , xz |
Save with xz compression |
S3 , generic , methods |
S3 object oriented system, UseMethod ; getS3method ; https://adv-r.hadley.nz/s3.html |
sd |
Standard deviation |
search |
Search path |
Surv |
Survival, i.e., time-to-event, object |
trapz , cumtrapz |
(Cumulative) trapezoidal integration, from package pracma (Borchers 2023) |
vector |
Vector |
This work is supported by National Institutes of Health, U.S. Department of Health and Human Services grants
We introduce a new S3
class groupedHyperframe
for grouped hyper data frame, which inherits
from the hyper data frame hyperframe
class from package spatstat.geom
(Baddeley, Rubak, and Turner 2015; Baddeley and Turner 2005). A hyperframe
contains columns either as vector
s like in a data.frame
, or as list
s of objects of the same class, a.k.a, the hypercolumns
. This data structure is particularly useful in spatial analysis, e.g., with medical images, where the spatial information in each image would be represented by one element in a hypercolumn
. The derived class groupedHyperframe
has additional attributes
attr(., 'group')
, a formula
of the (nested) grouping structure, e.g., ~patient/image
when each patient has one or more imagesThe grammar of the nested grouping structure g_1/.../g_m (~g1/.../gm
) follows that of the parameter random
of functions nlme::lme()
and nlme::nlme()
. In fact, the 'grouped'
extension of a hyperframe
is inspired by the nlme::groupedData
class which inherits from data.frame
(Pinheiro, Bates, and R Core Team 2025).
In this section, we introduce several S3
method dispatches of the S3
generic as.groupedHyperframe()
to convert various classes into a groupedHyperframe
. We also introduce aggregation functions aggregate_*()
of the hypercolumns in a groupedHyperframe
, at either one of the nested grouping levels g_1,\cdots,g_{m-1}. Aggregation at the lowest grouping level g_m is ignored, i.e., no aggregation to be performed. Available aggregation methods are the parallel minima base::pmin()
, parallel maxima base::pmax()
, parallel means pmean()
(default) and parallel medians pmedian()
.
data.frame
User may convert a data.frame
with substantial amount of duplicated information into a groupedHyperframe
using the S3
method dispatch as.groupedHyperframe.data.frame()
. This function
data.frame
by the user-specified (nested) group
ing structure;groupedHyperframe
with the user-specified (nested) grouping structure.In the following example, consider a toy data set wrobel_lung0
with non-identical column hladr
in the lowest group image_id
of the nested grouping structure ~patient_id/image_id
.
= wrobel_lung |>
wrobel_lung0 within.data.frame(expr = {
= y = NULL
x = phenotype = tissue = NULL
dapi })
|> head()
wrobel_lung0 #> image_id patient_id gender stage_numeric pack_years
#> 1 [40864,18015].im3 #01 0-889-121 F 1 60
#> 2 [40864,18015].im3 #01 0-889-121 F 1 60
#> 3 [40864,18015].im3 #01 0-889-121 F 1 60
#> 4 [40864,18015].im3 #01 0-889-121 F 1 60
#> 5 [40864,18015].im3 #01 0-889-121 F 1 60
#> 6 [40864,18015].im3 #01 0-889-121 F 1 60
#> adjuvant_therapy hladr OS mhcII stage age
#> 1 No 0.115 3488+ low IA 85
#> 2 No 0.239 3488+ low IA 85
#> 3 No 0.268 3488+ low IA 85
#> 4 No 0.245 3488+ low IA 85
#> 5 No 0.127 3488+ low IA 85
#> 6 No 0.136 3488+ low IA 85
By converting wrobel_lung0
into a groupedHyperframe
, the numeric hladr
from each ~patient_id/image_id
are converted into elements of the numeric-hypercolumn hladr
in the returned wrobel_lung0g
. Each row of a groupedHyperframe
represents the lowest group of the nested grouping structure. The R
console output (S3
method dispatch print.groupedHyperframe()
) highlights the nested grouping structure, number of clusters at each grouping level, as well as the first 10 (or less) rows of the groupedHyperframe
.
wrobel_lung0g = wrobel_lung0 |> as.groupedHyperframe(group = ~ patient_id/image_id))
(#> Grouped Hyperframe: ~patient_id/image_id
#>
#> 15 image_id nested in
#> 3 patient_id
#>
#> Preview of first 10 (or less) rows:
#>
#> hladr image_id patient_id gender stage_numeric pack_years
#> 1 (numeric) [40864,18015].im3 #01 0-889-121 F 1 60
#> 2 (numeric) [42689,19214].im3 #01 0-889-121 F 1 60
#> 3 (numeric) [42806,16718].im3 #01 0-889-121 F 1 60
#> 4 (numeric) [44311,17766].im3 #01 0-889-121 F 1 60
#> 5 (numeric) [45366,16647].im3 #01 0-889-121 F 1 60
#> 6 (numeric) [56576,16907].im3 #02 1-037-393 M 1 30
#> 7 (numeric) [56583,15235].im3 #02 1-037-393 M 1 30
#> 8 (numeric) [57130,16082].im3 #02 1-037-393 M 1 30
#> 9 (numeric) [57396,17896].im3 #02 1-037-393 M 1 30
#> 10 (numeric) [57403,16934].im3 #02 1-037-393 M 1 30
#> adjuvant_therapy OS mhcII stage age
#> 1 No 3488+ low IA 85
#> 2 No 3488+ low IA 85
#> 3 No 3488+ low IA 85
#> 4 No 3488+ low IA 85
#> 5 No 3488+ low IA 85
#> 6 No 1605 low IA 66
#> 7 No 1605 low IA 66
#> 8 No 1605 low IA 66
#> 9 No 1605 low IA 66
#> 10 No 1605 low IA 66
Converting a data.frame
with substantial amount of duplicated information like wrobel_lung0
into a groupedHyperframe
greatly reduces the memory allocation.
unclass(object.size(wrobel_lung0g)) / unclass(object.size(wrobel_lung0))
#> [1] 0.113451
A groupedHyperframe
, however, would not reduce much the save
d file.size
compared to a data.frame
, if xz
compression is used for both.
= tempfile(fileext = '.rds')
f_g |> saveRDS(file = f_g, compress = 'xz')
wrobel_lung0g = tempfile(fileext = '.rds')
f |> saveRDS(file = f, compress = 'xz')
wrobel_lung0 file.size(f_g) / file.size(f) # not much reduction
#> [1] 0.9382716
We use function aggregate_quantile()
to aggregate the quantile
s of each element in the numeric-hypercolumn hladr
in wrobel_lung0g
by point-wise means (default of parameter f_aggr_
) at the second-lowest group ~patient_id
. The returned object is a hyperframe
instead of a groupedHyperframe
, as we have one aggregated hladr.quantile
per ~patient_id
, thus eliminates the need for a grouping structure.
|>
wrobel_lung0g aggregate_quantile(by = ~ patient_id, probs = seq.int(from = .01, to = .99, by = .01))
#> Hyperframe:
#> patient_id gender stage_numeric pack_years adjuvant_therapy OS mhcII
#> 1 #01 0-889-121 F 1 60 No 3488+ low
#> 2 #02 1-037-393 M 1 30 No 1605 low
#> 3 #03 2-080-378 M 3 50 No 176 high
#> stage age hladr.quantile
#> 1 IA 85 (numeric)
#> 2 IA 66 (numeric)
#> 3 IIIA 84 (numeric)
In this package, we have include a groupedHyperframe
example Ki67
with a numeric-hypercolumn logKi67
and a nested grouping structure ~patientID/tissueID
.
data(Ki67, package = 'groupedHyperframe')
Ki67#> Grouped Hyperframe: ~patientID/tissueID
#>
#> 645 tissueID nested in
#> 622 patientID
#>
#> Preview of first 10 (or less) rows:
#>
#> logKi67 tissueID Tstage PFS recfreesurv_mon recurrence adj_rad adj_chemo
#> 1 (numeric) TJUe_I17 2 100+ 100 0 FALSE FALSE
#> 2 (numeric) TJUe_G17 1 22 22 1 FALSE FALSE
#> 3 (numeric) TJUe_F17 1 99+ 99 0 FALSE NA
#> 4 (numeric) TJUe_D17 1 99+ 99 0 FALSE TRUE
#> 5 (numeric) TJUe_J18 1 112 112 1 TRUE TRUE
#> 6 (numeric) TJUe_N17 4 12 12 1 TRUE FALSE
#> 7 (numeric) TJUe_J17 2 64+ 64 0 FALSE FALSE
#> 8 (numeric) TJUe_F19 2 56+ 56 0 FALSE FALSE
#> 9 (numeric) TJUe_P19 2 79+ 79 0 FALSE FALSE
#> 10 (numeric) TJUe_O19 2 26 26 1 FALSE TRUE
#> histology Her2 HR node race age patientID
#> 1 3 TRUE TRUE TRUE White 66 PT00037
#> 2 3 FALSE TRUE FALSE Black 42 PT00039
#> 3 3 FALSE TRUE FALSE White 60 PT00040
#> 4 3 FALSE TRUE TRUE White 53 PT00042
#> 5 3 FALSE TRUE TRUE White 52 PT00054
#> 6 2 TRUE TRUE TRUE Black 51 PT00059
#> 7 3 FALSE TRUE TRUE Asian 50 PT00062
#> 8 2 TRUE TRUE TRUE White 37 PT00068
#> 9 3 TRUE TRUE FALSE White 68 PT00082
#> 10 2 TRUE TRUE FALSE Black 55 PT00084
Similarly, we use function aggregate_quantile()
to aggregate the quantile
s of each element in the numeric-hypercolumn logKi67
at the second-lowest group ~patientID
.
= Ki67 |>
s aggregate_quantile(by = ~ patientID, probs = seq.int(from = .01, to = .99, by = .01))
|> head()
s #> Hyperframe:
#> Tstage PFS recfreesurv_mon recurrence adj_rad adj_chemo histology Her2 HR
#> 1 2 100+ 100 0 FALSE FALSE 3 TRUE TRUE
#> 2 1 22 22 1 FALSE FALSE 3 FALSE TRUE
#> 3 1 99+ 99 0 FALSE NA 3 FALSE TRUE
#> 4 1 99+ 99 0 FALSE TRUE 3 FALSE TRUE
#> 5 1 112 112 1 TRUE TRUE 3 FALSE TRUE
#> 6 4 12 12 1 TRUE FALSE 2 TRUE TRUE
#> node race age patientID logKi67.quantile
#> 1 TRUE White 66 PT00037 (numeric)
#> 2 FALSE Black 42 PT00039 (numeric)
#> 3 FALSE White 60 PT00040 (numeric)
#> 4 TRUE White 53 PT00042 (numeric)
#> 5 TRUE White 52 PT00054 (numeric)
#> 6 TRUE Black 51 PT00059 (numeric)
Users are encouraged to learn more about the applications of the aggregated quantiles of Ki67
data from package hyper.gam
vignettes (RPubs, CRAN
), section Quantile Index, as well as from our peer-reviewed publications Yi et al. (2025); Yi et al. (2023b); Yi et al. (2023a).
hyperframe
Users may convert a hyperframe
provided in the package spatstat.data
into a groupedHyperframe
using the S3
method dispatch as.groupedHyperframe.hyperframe()
. This function simply inspects and adds a (nested) grouping structure to the input hyperframe
.
In the following example, we inspect the data set spatstat.data::osteo
, which has the serial number of sampling volume brick
nested in the bone sample id
, and add the nested grouping structure ~id/brick
to it.
::osteo |>
spatstat.dataas.groupedHyperframe(group = ~ id/brick)
#> Grouped Hyperframe: ~id/brick
#>
#> 40 brick nested in
#> 4 id
#>
#> Preview of first 10 (or less) rows:
#>
#> id shortid brick pts depth
#> 1 c77za4 4 1 (pp3) 45
#> 2 c77za4 4 2 (pp3) 60
#> 3 c77za4 4 3 (pp3) 55
#> 4 c77za4 4 4 (pp3) 60
#> 5 c77za4 4 5 (pp3) 85
#> 6 c77za4 4 6 (pp3) 90
#> 7 c77za4 4 7 (pp3) 95
#> 8 c77za4 4 8 (pp3) 65
#> 9 c77za4 4 9 (pp3) 100
#> 10 c77za4 4 10 (pp3) 100
In this vignette, we do not place much emphasize on the objects provided in package spatstat.data
, for now.
ppp
-HypercolumnIn this section, we introduce the creation of groupedHyperframe
with one-and-only-one point pattern (ppp
) hypercolumn, as well as the batch processes of spatial point pattern analyses on the one-and-only-one ppp
-hypercolumn of a hyperframe
(and/or groupedHyperframe
).
These batch processes are not intended for a hyperframe
(and/or groupedHyperframe
) with multiple ppp
-hypercolumn in the foreseeable future, as that would require checking for name clashes in the marks
from multiple ppp
-hypercolumn.
ppp
-hypercolumnFunction grouped_ppp()
creates a groupedHyperframe
with one-and-only-one ppp
-hypercolumn.
In the following example, the argument formula
specifies
marks
, e.g., numeric mark hladr
and multitype
mark phenotype
, on the left-hand-sideOS
, gender
and age
, before the |
separator on the right-hand-sideimage_id
nested in patient_id
, after the |
separator on the right-hand-side.s = wrobel_lung |>
(grouped_ppp(formula = hladr + phenotype ~ OS + gender + age | patient_id/image_id))
#> Grouped Hyperframe: ~patient_id/image_id
#>
#> 15 image_id nested in
#> 3 patient_id
#>
#> Preview of first 10 (or less) rows:
#>
#> OS gender age patient_id image_id ppp.
#> 1 3488+ F 85 #01 0-889-121 [40864,18015].im3 (ppp)
#> 2 3488+ F 85 #01 0-889-121 [42689,19214].im3 (ppp)
#> 3 3488+ F 85 #01 0-889-121 [42806,16718].im3 (ppp)
#> 4 3488+ F 85 #01 0-889-121 [44311,17766].im3 (ppp)
#> 5 3488+ F 85 #01 0-889-121 [45366,16647].im3 (ppp)
#> 6 1605 M 66 #02 1-037-393 [56576,16907].im3 (ppp)
#> 7 1605 M 66 #02 1-037-393 [56583,15235].im3 (ppp)
#> 8 1605 M 66 #02 1-037-393 [57130,16082].im3 (ppp)
#> 9 1605 M 66 #02 1-037-393 [57396,17896].im3 (ppp)
#> 10 1605 M 66 #02 1-037-393 [57403,16934].im3 (ppp)
fv
-hypercolumnIn this section, we discuss the batch processes that return a function value table (fv
) hypercolumn, i.e., a hypercolumn which consists of a list
of fv.object
s.
marks
Batch Process | Workhorse in package spatstat.explore |
fv -hypercolumns Suffix |
---|---|---|
Emark_() and Vmark_() |
Emark and Vmark , conditional mean E(r) and conditional variance V(r), diagnostics for dependence between the points and the marks (Schlather, Ribeiro, and Diggle 2003) |
.E and .V |
markcorr_() |
markcorr , marked correlation k_{mm}(r) or generalized mark correlation k_f(r) (Stoyan and Stoyan 1994) |
.k |
markvario_() |
markvario , mark variogram \gamma(r) (Wälder and Stoyan 1996) |
.gamma |
Kmark_() |
Kmark , mark-weighted K_f(r) function (Penttinen, Stoyan, and Henttonen 1992) |
.K |
In package spatstat.explore
(up to version 3.4.3
, 2025-05-21), function markcorr()
is the workhorse inside functions Emark()
, Vmark()
and markvario()
. Function markcorr()
relies on the un-exported workhorse function spatstat.explore:::sewsmod()
, whose default method = "density"
contains the calculation of the ratio of two kernel densities. Due to the floating-point precision of R
, such density ratios may have exceptional returns of
0
, from 0/\delta, where \delta\geq (approximately) 2.6e-324
NaN
, from 0/\varepsilon, where \varepsilon\leq (approximately) 2.5e-324
Inf
, from \delta/0, where \delta\geq (approximately) 2.6e-324
0 / c(2.6e-324, 2.5e-324)
#> [1] 0 NaN
c(2.5e-324, 2.6e-324) / 0
#> [1] NaN Inf
Function markcorr()
provides a default argument of parameter r, at which the mark correlation function k_f(r) are evaluated, using function spatstat.geom::handle.r.b.args()
. The S3 method dispatch spatstat.explore::print.fv()
prints the recommended range and available range of the argument r.
::spruces |>
spatstat.data::markcorr()
spatstat.explore#> Function value object (class 'fv')
#> for the function r -> k[mm](r)
#> ................................................................................
#> Math.label
#> r r
#> theo {k[mm]^{iid}}(r)
#> trans {hat(k)[mm]^{trans}}(r)
#> iso {hat(k)[mm]^{iso}}(r)
#> Description
#> r distance argument r
#> theo theoretical value (independent marks) for k[mm](r)
#> trans translation-corrected estimate of k[mm](r)
#> iso Ripley isotropic correction estimate of k[mm](r)
#> ................................................................................
#> Default plot formula: .~r
#> where "." stands for 'iso', 'trans', 'theo'
#> Recommended range of argument r: [0, 9.5]
#> Available range of argument r: [0, 9.5]
#> Unit of length: 1 metre
We may observe exceptional returns if we go beyond the recommended range and/or available range. In the following example, we see that the mark correlation k_f(r) (column iso
) having value NaN
at r=81,88,89,90, value 0
at r=82 and value Inf
at r=83,84,87.
::spruces |>
spatstat.data::markcorr(r = 0:90) |>
spatstat.explore::as.data.frame.fv() |>
spatstat.explore::tail(n = 10L)
utils#> r theo trans iso
#> 82 81 1 NaN NaN
#> 83 82 1 NaN 0.00000
#> 84 83 1 NaN Inf
#> 85 84 1 NaN Inf
#> 86 85 1 Inf NaN
#> 87 86 1 Inf 11.50015
#> 88 87 1 NaN Inf
#> 89 88 1 Inf NaN
#> 90 89 1 NaN NaN
#> 91 90 1 NaN NaN
The batch process markcorr_()
, as well as Emark_()
, Vmark_()
and markvario_()
which rely on the workhorse markcorr()
, prints the Recommended r_\text{max} from each of the fv
-returns, no matter a user-specified argument for parameter r
is provided or not.
|>
s Emark_(correction = 'none')
#>
#> Recommended rmax for hladr.E are 15⨯ 125.625
When a user-specified r
is provided for a batch process on all ppp.object
s in the ppp
-hypercolumn, inevitably some of the fv
-returns may have exceptional values. We discuss this exception handling in the next section Aggregation over nested grouping structure.
multitype
marks
Batch Process | Workhorse in package spatstat.explore |
fv -hypercolumns Suffix |
---|---|---|
Gcross_() |
Gcross , multitype nearest-neighbour distance G_{ij}(r) |
.G |
Kcross_() |
Kcross , multitype K_{ij}(r) |
.K |
Jcross_() |
Jcross , multitype J_{ij}(r) (Van Lieshout and Baddeley 1999) |
.J |
Lcross_() |
Lcross , multitype L_{ij}(r)=\sqrt{\frac{K_{ij}(r)}{\pi}} |
.L |
Batch Process | Workhorse in package spatstat.geom |
Applicable to | numeric-hypercolumns Suffix |
---|---|---|---|
nncross_() |
nncross.ppp (., what = 'dist') , nearest neighbour distance |
multitype marks |
.nncross |
Multiple batch processes may be applied to a hyperframe
(and/or groupedHyperframe
) in a pipeline using the native pipe operator |>
introduced since R
4.1.0.
= seq.int(from = 0, to = 250, by = 10)
r = s |>
out Emark_(r = r, correction = 'none') |> # slow
# Vmark_(r = r, correction = 'none') |> # slow
# markcorr_(r = r, correction = 'none') |> # slow
# markvario_(r = r, correction = 'none') |> # slow
# Kmark_(r = r, correction = 'none') |> # fast
Gcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'none') |> # fast
# Kcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'none') |> # fast
nncross_(i = 'CK+.CD8-', j = 'CK-.CD8+', correction = 'none') # fast
#>
#> Recommended rmax for hladr.E are 15⨯ 125.625
#>
The returned hyperframe
(or groupedHyperframe
) has
fv
-hypercolumn hladr.E
, created by function Emark_()
on numeric mark hladr
fv
-hypercolumn phenotype.G
, created by function Gcross_()
on multitype
mark phenotype
phenotype.nncross
, created by function nncross_()
on multitype
mark phenotype
out#> Grouped Hyperframe: ~patient_id/image_id
#>
#> 15 image_id nested in
#> 3 patient_id
#>
#> Preview of first 10 (or less) rows:
#>
#> OS gender age patient_id image_id ppp. hladr.E phenotype.G
#> 1 3488+ F 85 #01 0-889-121 [40864,18015].im3 (ppp) (fv) (fv)
#> 2 3488+ F 85 #01 0-889-121 [42689,19214].im3 (ppp) (fv) (fv)
#> 3 3488+ F 85 #01 0-889-121 [42806,16718].im3 (ppp) (fv) (fv)
#> 4 3488+ F 85 #01 0-889-121 [44311,17766].im3 (ppp) (fv) (fv)
#> 5 3488+ F 85 #01 0-889-121 [45366,16647].im3 (ppp) (fv) (fv)
#> 6 1605 M 66 #02 1-037-393 [56576,16907].im3 (ppp) (fv) (fv)
#> 7 1605 M 66 #02 1-037-393 [56583,15235].im3 (ppp) (fv) (fv)
#> 8 1605 M 66 #02 1-037-393 [57130,16082].im3 (ppp) (fv) (fv)
#> 9 1605 M 66 #02 1-037-393 [57396,17896].im3 (ppp) (fv) (fv)
#> 10 1605 M 66 #02 1-037-393 [57403,16934].im3 (ppp) (fv) (fv)
#> phenotype.nncross
#> 1 (numeric)
#> 2 (numeric)
#> 3 (numeric)
#> 4 (numeric)
#> 5 (numeric)
#> 6 (numeric)
#> 7 (numeric)
#> 8 (numeric)
#> 9 (numeric)
#> 10 (numeric)
fv
-hypercolumn(s)Function aggregate_fv()
aggregates
plot.fv
.In the following example, we have
hladr.E.value
and phenotype.G.value
, the aggregated function values from fv
-hypercolumns hladr.E
and phenotype.G
, respectively.hladr.E.cumtrapz
and phenotype.G.cumtrapz
, the aggregated cumulative trapezoidal integrations from fv
-hypercolumns hladr.E
and phenotype.G
, respectively.afv = out |>
(aggregate_fv(by = ~ patient_id, f_aggr_ = pmean))
#> Hyperframe:
#> OS gender age patient_id hladr.E.value hladr.E.cumtrapz
#> 1 3488+ F 85 #01 0-889-121 (numeric) (numeric)
#> 2 1605 M 66 #02 1-037-393 (numeric) (numeric)
#> 3 176 M 84 #03 2-080-378 (numeric) (numeric)
#> phenotype.G.value phenotype.G.cumtrapz
#> 1 (numeric) (numeric)
#> 2 (numeric) (numeric)
#> 3 (numeric) (numeric)
Each of the numeric-hypercolumns contains tabulated values on the common grid of r
. One “slice” of this grid may be extracted by
$hladr.E.cumtrapz |> .slice(j = '50')
afv#> 1 2 3
#> 10.60151 10.49337 31.05115
As we have mentioned in the previous section Batch processes, a same user-specified argument of r
will be used for all ppp.object
s in the ppp
-hypercolumn. Suppose a naive user uses an r-vector well beyond the recommended range and/or available range. In this case, function aggregate_fv()
prints a message
of Legal r_\text{max}, which is determined by the last value of r, that no value of NaN
and/or Inf
appears in any of the fv
-returns, e.g., in the hypercolumn hladr.E
. Note that the 0
-values in an fv
-return are typically a sign of degeneration as well, but function aggregate_fv()
does not eliminate 0
-values from the determination of legal r_\text{max}.
= seq.int(from = 0, to = 1000, by = 50)
r |>
s Emark_(r = r, correction = 'none') |>
aggregate_fv(by = ~ patient_id, f_aggr_ = pmean)
#>
#> Recommended rmax for hladr.E are 15⨯ 125.625
#> Legal rmax for hladr.E is 750
ppp
-hypercolumnquantile
sFunction aggregate_quantile()
aggregates the quantile
s of the numeric-hypercolumns and the numeric marks in the ppp
-hypercolumn.
In the following example, we have
phenotype.nncross.quantile
, the aggregated quantile
s of numeric-hypercolumn phenotype.nncross
.hladr.quantile
, the aggregated quantile
s of numeric mark hladr
in ppp
-hypercolumn.|>
out aggregate_quantile(by = ~ patient_id, probs = seq.int(from = 0, to = 1, by = .1))
#> Hyperframe:
#> OS gender age patient_id phenotype.nncross.quantile hladr.quantile
#> 1 3488+ F 85 #01 0-889-121 (numeric) (numeric)
#> 2 1605 M 66 #02 1-037-393 (numeric) (numeric)
#> 3 176 M 84 #03 2-080-378 (numeric) (numeric)
Function aggregate_kerndens()
aggregates the kernel density of the numeric-hypercolumns and the numeric marks in the ppp
-hypercolumn.
In the following example, we have
phenotype.nncross.kerndens
, the aggregated kernel densities of numeric-hypercolumn phenotype.nncross
.hladr.kerndens
, the aggregated kernel densities of numeric mark hladr
in ppp
-hypercolumn.mdist = out$phenotype.nncross |> unlist() |> max())
(#> [1] 333.2417
|>
out aggregate_kerndens(by = ~ patient_id, from = 0, to = mdist)
#> Hyperframe:
#> OS gender age patient_id phenotype.nncross.kerndens hladr.kerndens
#> 1 3488+ F 85 #01 0-889-121 (numeric) (numeric)
#> 2 1605 M 66 #02 1-037-393 (numeric) (numeric)
#> 3 176 M 84 #03 2-080-378 (numeric) (numeric)
The S3
generic .kmeans()
performs k-means clustering (workhorse function stats::kmeans
).
ppp.object
data(shapley, package = 'spatstat.data')
shapley#> Marked planar point pattern: 4215 points
#> Mark variables: Mag, V, SigV
#> window: polygonal boundary
#> enclosing rectangle: [192.80185, 216.26227] x [-37.75511, -27.40664] degrees
The S3
method dispatch .kmeans.ppp()
performs k-means clustering, with paramters
formula
, user-specified coordinate(s) and/or numeric marks
;centers
, number of clusters, see ?stats::kmeans
clusterSize
, “expected” cluster size.marks
Example below shows a clustering based on the x- and y-coordinates, as well as the numeric mark Mag
.
= shapley |> .kmeans(formula = ~ x + y + Mag, centers = 3L)
km |> class()
km #> [1] "kmeans"
Example below shows a clustering based on x-coordinate and Mag
.
= shapley |> .kmeans(formula = ~ x + Mag, centers = 3L)
km1 |> class()
km1 #> [1] "kmeans"
Example below shows a clustering based on x- and y-coordinates only.
= shapley |> .kmeans(formula = ~ x + y, centers = 3L)
km2 |> class()
km2 #> [1] "kmeans"
clusterSize
Example below shows a clustering specified by clusterSize
.
= shapley |> .kmeans(formula = ~ x + y, clusterSize = 1e3L)
km3 |> class()
km3 #> [1] "kmeans"
$centers # 5 clusters needed
km3#> x y
#> 1 202.1441 -31.01862
#> 2 206.0951 -33.34088
#> 3 194.6935 -29.95076
#> 4 199.0664 -34.90336
#> 5 210.8689 -31.15603
$cluster |> table()
km3#>
#> 1 2 3 4 5
#> 1467 679 1099 448 522
The S3
generic split_kmeans()
split
s ppp.object
, listof
ppp.object
s, and hyperframe
by k-means clustering.
Note that many functions in package groupedHyperframe
require a 'dataframe'
markformat
for ppp.object
s.
data(flu, package = 'spatstat.data')
$pattern[[1L]] |>
flu::markformat()
spatstat.geom#> [1] "vector"
User may convert a 'vector'
markformat
to 'dataframe'
using the syntactic sugar `mark_name<-`
,
$pattern[] = flu$pattern |>
flulapply(FUN = `mark_name<-`, value = 'stain') # read ?flu carefully
$pattern[[1L]] |>
flu::markformat()
spatstat.geom#> [1] "dataframe"
ppp.object
The S3
method dispatch split_kmeans.default()
splits a ppp.object
by k-means clustering.
$pattern[[1L]] |> split_kmeans(formula = ~ x + y, centers = 3L)
flu#> Point pattern split by factor
#>
#> 1:
#> Marked planar point pattern: 157 points
#> Multitype, with levels = M2, M1
#> window: rectangle = [0, 3331] x [0, 3331] nm
#>
#> 2:
#> Marked planar point pattern: 145 points
#> Multitype, with levels = M2, M1
#> window: rectangle = [0, 3331] x [0, 3331] nm
#>
#> 3:
#> Marked planar point pattern: 169 points
#> Multitype, with levels = M2, M1
#> window: rectangle = [0, 3331] x [0, 3331] nm
listof
ppp.object
sThe S3
method dispatch split_kmeans.listof()
splits a listof
ppp.object
s by k-means clustering.
The returned object has attributes
attr(.,'.id')
, indices of the ppp.object
s before splitting.attr(.,'.cluster')
, indices of k-means clusters, nested in .id
.$pattern[1:2] |> split_kmeans(formula = ~ x + y, centers = 3L)
flu#> $`wt M2-M1 13.1`
#> Marked planar point pattern: 153 points
#> Multitype, with levels = M2, M1
#> window: rectangle = [0, 3331] x [0, 3331] nm
#>
#> $`wt M2-M1 13.2`
#> Marked planar point pattern: 147 points
#> Multitype, with levels = M2, M1
#> window: rectangle = [0, 3331] x [0, 3331] nm
#>
#> $`wt M2-M1 13.3`
#> Marked planar point pattern: 171 points
#> Multitype, with levels = M2, M1
#> window: rectangle = [0, 3331] x [0, 3331] nm
#>
#> $`wt M2-M1 22.1`
#> Marked planar point pattern: 94 points
#> Multitype, with levels = M2, M1
#> window: rectangle = [0, 3331] x [0, 3331] nm
#>
#> $`wt M2-M1 22.2`
#> Marked planar point pattern: 79 points
#> Multitype, with levels = M2, M1
#> window: rectangle = [0, 3331] x [0, 3331] nm
#>
#> $`wt M2-M1 22.3`
#> Marked planar point pattern: 44 points
#> Multitype, with levels = M2, M1
#> window: rectangle = [0, 3331] x [0, 3331] nm
#>
#> attr(,"id")
#> [1] 1 1 1 2 2 2
#> attr(,"cluster")
#> [1] 1 2 3 1 2 3
hyperframe
and/or groupedHyperframe
The S3
method dispatch split_kmeans.hyperframe()
splits a hyperframe
and/or groupedHyperframe
by k-means clustering of the one-and-only-one ppp
-hypercolumn.
The returned object is a groupedHyperframe
with grouping structure
~.id/.cluster
, if the input is a hyperframe
~ existing/grouping/structure/.cluster
, if the input is a groupedHyperframe
. Note that the grouping level .id
is believed to be equivalent to the lowest level of existing grouping structure.1:2,] |> split_kmeans(formula = ~ x + y, centers = 3L)
flu[#> Grouped Hyperframe: ~.id/.cluster
#>
#> 6 .cluster nested in
#> 2 .id
#>
#> Preview of first 10 (or less) rows:
#>
#> pattern .id .cluster virustype stain frameid
#> 1 (ppp) 1 1 wt M2-M1 13
#> 2 (ppp) 1 2 wt M2-M1 13
#> 3 (ppp) 1 3 wt M2-M1 13
#> 4 (ppp) 2 1 wt M2-M1 22
#> 5 (ppp) 2 2 wt M2-M1 22
#> 6 (ppp) 2 3 wt M2-M1 22
The S3
generic pairwise_cor_spatial()
calculates the nonparametric, rank-based, Tjøstheim’s correlation coefficients (Tjøstheim 1978; Hubert and Golledge 1982) in a pairwise-combination fashion, using the workhorse function SpatialPack::cor.spatial()
. All S3
method dispatches return a object of class 'pairwise_cor_spatial'
, which inherits
from class 'dist'
.
ppp.object
The S3
method dispatch pairwise_cor_spatial.ppp()
finds the nonparametric Tjøstheim’s correlation coefficients from the pairwise-combinations of all numeric marks of a ppp.object
.
data(finpines, package = 'spatstat.data')
r = finpines |> pairwise_cor_spatial())
(#> diameter
#> height 0.7287879
The printing of 'pairwise_cor_spatial'
is taken care of by function stats:::print.dist
.
The S3
method dispatch as.matrix.pairwise_cor_spatial()
returns a matrix
with diag
onal values of 1.
|> as.matrix()
r #> diameter height
#> diameter 1.0000000 0.7287879
#> height 0.7287879 1.0000000
Note that this matrix is not a cor
relation matrix, because Tjøstheim’s correlation coefficient
cov
ariance, standard deviation sd
, nor the conversion cov2cor
method;