--- title: "zarrs Backend" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{zarrs Backend} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", out.width = "100%" ) sample_dir <- tools::R_user_dir("pizzarr") clean <- !dir.exists(sample_dir) ``` pizzarr ships in two tiers. The CRAN build is pure R --- no Rust compilation, no system dependencies. It handles local and HTTP Zarr stores with sequential chunk I/O via `lapply`. The r-universe build compiles in the [zarrs](https://github.com/zarrs/zarrs) Rust crate via [extendr](https://cran.r-project.org/package=rextendr), adding parallel decompression, cloud-native store backends (S3, GCS), and codecs beyond what R packages provide. The split exists because CRAN's macOS build machines ship a Rust toolchain (rustc 1.84) that is too old for zarrs, which requires rustc >= 1.91. r-universe builds against the latest stable toolchain, so it can compile zarrs and distribute pre-built binaries. End users on either tier install with `install.packages()` --- no Rust toolchain needed. ## Checking availability ```{r} library(pizzarr) has_zarrs <- pizzarr:::.pizzarr_env$zarrs_available ``` `pizzarr_compiled_features()` lists the feature flags compiled into the zarrs backend. On the CRAN tier it returns `character(0)` with a message; on the r-universe tier it returns the compiled capabilities: ```{r} pizzarr_compiled_features() ``` The internal flag `.pizzarr_env$zarrs_available` is a logical scalar set once at package load. Dispatch logic throughout pizzarr checks this flag to decide whether to call into Rust or fall through to the R-native path: ```{r} pizzarr:::.pizzarr_env$zarrs_available ``` ## Upgrading to the zarrs tier `pizzarr_upgrade()` prints the r-universe install command when zarrs is not compiled in, or confirms that the backend is already present: ```{r} pizzarr_upgrade() ``` The startup message that CRAN users see on `library(pizzarr)` can be silenced with `options(pizzarr.suggest_runiverse = FALSE)`. ## Probing store metadata The examples below require the zarrs backend. When this vignette is built without it, the code chunks are not evaluated. `zarrs_node_exists()` opens a filesystem store via the Rust backend, probes for V2 and V3 metadata keys at a given path, and returns a list with three fields: `exists` (logical), `node_type` (character), and `zarr_format` (integer or NULL). The store handle is cached on the Rust side --- subsequent calls to the same store path reuse it without re-opening. ### V2 store ```{r, eval=has_zarrs} v2_root <- pizzarr_sample("fixtures/v2/data.zarr") # Root group zarrs_node_exists(v2_root, "") ``` ```{r, eval=has_zarrs} # An array within the store zarrs_node_exists(v2_root, "1d.contiguous.lz4.i2") ``` ```{r, eval=has_zarrs} # A path that does not exist zarrs_node_exists(v2_root, "does_not_exist") ``` ### V3 store V2 and V3 detection is automatic. zarrs probes for `zarr.json` first (V3), then falls back to `.zarray` / `.zgroup` (V2): ```{r, eval=has_zarrs} v3_root <- pizzarr_sample("fixtures/v3/data.zarr") zarrs_node_exists(v3_root, "") ``` ## Store cache management The Rust backend holds open store handles in a process-global cache keyed by normalized path. `zarrs_close_store()` removes a handle from the cache and returns `TRUE`. A second call to the same path returns `FALSE` --- it was already removed: ```{r, eval=has_zarrs} zarrs_close_store(v2_root) zarrs_close_store(v2_root) ``` ```{r, eval=has_zarrs} zarrs_close_store(v3_root) ``` ## Array metadata `zarrs_open_array_metadata()` opens a zarrs array and returns its metadata as a named list. The store handle is cached, so repeated calls to the same store are fast. The returned list contains `shape`, `chunks`, `dtype`, `r_type`, `fill_value_json`, `zarr_format`, and `order`. ### V2 array ```{r, eval=has_zarrs} v2_root <- pizzarr_sample("fixtures/v2/data.zarr") zarrs_open_array_metadata(v2_root, "1d.contiguous.raw.i2") ``` ### V3 array V3 arrays work the same way. The `zarr_format` field distinguishes V2 from V3: ```{r, eval=has_zarrs} v3_root <- pizzarr_sample("fixtures/v3/data.zarr") zarrs_open_array_metadata(v3_root, "1d.contiguous.gzip.i2") ``` ### Data type classification The `r_type` field maps zarrs data types to R-compatible type families. zarrs numeric types are classified as `"double"`, `"integer"`, or `"logical"` based on what R can represent natively: - **double**: float64 (zero-cost), float32 (widened), uint32/int64/uint64 (widened, precision risk > 2^53) - **integer**: int32 (zero-cost), int8/int16/uint8/uint16 (widened) - **logical**: bool Unsupported types (strings, complex) report `"unsupported"` and fall back to the R-native code path. ```{r, eval=has_zarrs} zarrs_close_store(v2_root) zarrs_close_store(v3_root) ``` ## Runtime info and tuning `zarrs_runtime_info()` reports the current zarrs configuration --- the codec concurrency target, thread pool size, how many store handles are cached, and which features were compiled in: ```{r, eval=has_zarrs} zarrs_runtime_info() ``` ### pizzarr_config() `pizzarr_config()` is the main interface for viewing and changing concurrency settings. Called with no arguments it returns the current state; with arguments it sets the specified values: ```{r, eval=has_zarrs} # View current settings pizzarr_config() # Set codec concurrency to 2 parallel operations per read/write pizzarr_config(concurrent_target = 2L) zarrs_runtime_info()$codec_concurrent_target ``` Three settings are available: - **nthreads** --- rayon thread pool size. Set-once per R session (the thread pool can only be initialised once). For reliable session-level control, set the `PIZZARR_NTHREADS` environment variable before starting R. - **concurrent_target** --- how many codec operations zarrs runs in parallel within a single read or write call. Can be changed at any time. - **http_batch_range_requests** --- whether HTTP stores use multipart range requests (default TRUE). Set to FALSE for servers with incomplete multipart support. Takes effect on the next store open. All three settings can also be configured via environment variables (`PIZZARR_NTHREADS`, `PIZZARR_CONCURRENT_TARGET`, `PIZZARR_HTTP_BATCH_RANGE_REQUESTS`) or R options (`pizzarr.nthreads`, etc.), which are read at package load time. Environment variables persist across sessions without needing `.Rprofile` edits. The lower-level `zarrs_set_codec_concurrent_target()` function is still available for direct use: ```{r, eval=has_zarrs} zarrs_set_codec_concurrent_target(2L) zarrs_runtime_info()$codec_concurrent_target ``` ## Reading data via zarrs When the zarrs backend is available and the selection is a contiguous slice (step == 1), `ZarrArray$get_item()` dispatches reads to zarrs automatically. zarrs handles chunk identification, parallel decompression, and codec execution internally, bypassing pizzarr's R-native chunk loop. Scalar integer selections (e.g., selecting a single row of a matrix) are also eligible --- they become length-1 ranges on the Rust side. Unsupported selections (step > 1 slices, fancy indexing, MemoryStore) fall through to the R-native path transparently. ### Basic read ```{r, eval=has_zarrs} d <- tempfile("zarrs_vignette_") z <- zarr_create(store = d, shape = c(100L, 50L), chunks = c(10L, 10L), dtype = "