Getting started with phyloatlas

What this package does

phyloatlas is a thin R client for the Phylo-Species Atlas, a curated collection of standardized empirical phylogenetic trees covering Bacteria, Archaea, and Eukaryota. The atlas itself lives on GitHub and is version-pinned via Zenodo. This R package gives you four convenience functions to fetch any of those trees with species labels resolved from the shared dictionary, plus helpers to list available trees and inspect their provenance.

library(phyloatlas)

ls("package:phyloatlas")
#> [1] "atlas_clear_cache" "atlas_info"        "list_trees"       
#> [4] "load_atlas_tree"

Loading a tree (offline demo)

The package ships a small Newick file under inst/extdata/ so you can explore the API without a network connection. In production you would call load_atlas_tree("mammals") (or any other tree name from list_trees()); the demo below shows the shape of the returned object.

demo_path <- system.file("extdata", "tree_demo.nwk", package = "phyloatlas")
tree <- ape::read.tree(demo_path)
tree
#> 
#> Phylogenetic tree with 10 tips and 9 internal nodes.
#> 
#> Tip labels:
#>   1, 2, 3, 4, 5, 6, ...
#> 
#> Rooted; includes branch length(s).

The returned object is a standard ape::phylo, so anything that works on phylo works here: Ntip(), branching.times(), plot(), etc.

plot(tree, cex = 0.8, no.margin = TRUE)

Loading a real atlas tree (requires network)

When you have an internet connection, load_atlas_tree() fetches a standardized Newick file from the live atlas and resolves its integer tip IDs to species names using the shared dictionary.

tree <- load_atlas_tree("mammals")
tree
head(tree$tip.label)

For large trees (e.g. seed plants, ~342 k tips), skip the 18 MB dictionary download by setting resolve_labels = FALSE:

tree <- load_atlas_tree("seed_plants", resolve_labels = FALSE)

Browsing available trees

list_trees() returns a data frame with one row per tree and provenance columns merged in from the atlas’s data_provenance.csv:

trees <- list_trees()
head(trees[, c("name", "study", "ntips", "dated", "year")])

# Filter to dated trees with at least 1000 tips
subset(trees, dated & ntips > 1000)

atlas_info() returns the same row for a single tree, useful for scripting:

atlas_info("birds")

Pointing at a fork or local mirror

The package fetches everything relative to a single base URL, which you can override per session:

options(
  phyloatlas.base_url =
    "https://raw.githubusercontent.com/yourfork/phylo-species-atlas/main"
)
atlas_clear_cache()  # forget cached dictionary/metadata

This is also the mechanism the package’s own test suite uses to point at a file:// URL containing a miniature mirror of the atlas, so the tests run completely offline.

Provenance and reproducibility

Every tree in the atlas has a record in Supplementary Table S5 (per-tree provenance) and the historical succession of canonicals is in Supplementary Table S7. For reproducibility, cite the version-specific Zenodo DOI for the atlas release you used; the package version (packageVersion("phyloatlas")) identifies the client.

packageVersion("phyloatlas")
#> [1] '0.1.0'

Where to learn more