--- title: "Data Organization" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Data Organization} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` HDF5 files are best understood as "file systems within a file." Just as your computer has folders and files, an HDF5 file has **Groups** (folders) and **Datasets** (files). This hierarchical structure allows you to organize complex experimental data, metadata, and configuration settings into a single, self-describing package. This vignette explains how to create, manage, and modify this structure using `h5lite`. ```{r setup} library(h5lite) file <- tempfile(fileext = ".h5") ``` ## The Hierarchical Model HDF5 uses POSIX-style paths (like Linux or macOS) to identify objects. The root of the file is `/`. * `/` : The Root Group * `/experiment_1` : A Group (folder) * `/experiment_1/data` : A Dataset (file) inside the group ## Creating Groups ### Implicit Creation (Recommended) In most cases, you do not need to create groups manually. When you write a dataset to a path like `"data/experiment/run1"`, `h5lite` automatically creates the parent groups `"data"` and `"data/experiment"` if they do not exist. ### Explicit Creation If you need to create an empty group structure (perhaps to add attributes to it), you can use `h5_create_group()`. This function works like `mkdir -p`: it creates all necessary parent groups. ```{r} # Create a deep hierarchy h5_create_group(file, "project_A/simulation/run_01") # Verify h5_str(file) ``` ## Using Lists as Groups The most powerful way to organize data in `h5lite` is by mapping R **lists** to HDF5 **groups**. When you pass a named list to `h5_write()`, `h5lite` recursively writes the list structure to the file. * **Named Lists** become **Groups**. * **Atomic Vectors/Matrices** inside the list become **Datasets**. This allows you to organize your entire data structure in R and save it to disk in one command. ```{r} # Define a complex structure in R experiment_data <- list( metadata = list( id = I(101), technician = I("Dr. Smith"), timestamp = I("2023-10-27") ), measurements = list( raw = runif(10), calibration = c(0.1, 0.9) ), status = I("complete") ) # Write the entire structure to a group named "exp_101" h5_write(experiment_data, file, "exp_101") ``` ## Inspecting Structure You can visualize the organization of your file using `h5_ls()` and `h5_str()`. * `h5_ls()`: Returns a character vector of names. Useful for programmatic checks. * `h5_str()`: Prints a tree diagram. Useful for interactive exploration. ```{r} # List all objects recursively h5_ls(file, recursive = TRUE) # Visualize the tree h5_str(file) ``` ## Moving and Renaming Data organization often changes. You can rename objects or move them to different groups using `h5_move()`. This operation is metadata-only, meaning it is extremely fast even for large datasets, as the data itself is not rewritten. ```{r} # Rename 'exp_101' to 'archive_101' h5_move(file, "exp_101", "archive_101") # Move 'project_A' inside 'archive_101' h5_move(file, "project_A", "archive_101/project_A") h5_ls(file) ``` ## Deleting Objects You can remove groups or datasets using `h5_delete()`. * Deleting a dataset removes the data. * Deleting a group removes the group **and all of its children** (recursively). * The file size does not change, but the freed space can be reused. ```{r} # Delete the entire archive group h5_delete(file, "archive_101") # The file is now empty (except for the root) h5_ls(file) ``` ```{r, include=FALSE} unlink(file) ```