--- title: "Statistical Applications of taxodist" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Statistical Applications of taxodist} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) library(taxodist) ``` The `taxodist` distance matrix is a standard `dist` object and integrates directly with the R ecosystem for multivariate analysis. This vignette shows how to use it alongside `vegan` and `ape` for ecological and phylogenetic analyses. All examples use a set of threatened mammals from Brazil spanning five broad taxonomic groups. --- ## Example dataset: threatened mammals of Brazil ```{r taxa} taxa_brazil <- c( "Priodontes", "Myrmecophaga", "Chrysocyon", "Tapirus", "Didelphis", "Leontopithecus", "Brachyteles", "Panthera", "Pteronura", "Puma", "Sotalia", "Pontoporia", "Trichechus", "Mazama", "Blastocerus" ) ``` The dataset covers xenarthrans (`Priodontes`, `Myrmecophaga`), a marsupial (`Didelphis`), ungulates (`Tapirus`, `Mazama`, `Blastocerus`), primates (`Leontopithecus`, `Brachyteles`), carnivores (`Chrysocyon`, `Panthera`, `Pteronura`, `Puma`), cetaceans (`Sotalia`, `Pontoporia`), and a sirenian (`Trichechus`). The pairwise distance matrix is computed with a single call. Lineages are cached after the first retrieval, so subsequent analyses on the same taxa are instantaneous. ```{r matrix} library(taxodist) mat <- distance_matrix(taxa_brazil) print(mat) #> Priodontes Myrmecophaga Chrysocyon Tapirus Didelphis #> Myrmecophaga 0.01639344 #> Chrysocyon 0.01694915 0.01694915 #> Tapirus 0.01694915 0.01694915 0.01587302 #> Didelphis 0.01754386 0.01754386 0.01754386 0.01754386 #> Leontopithecus 0.01694915 0.01694915 0.01666667 0.01666667 0.01754386 #> Brachyteles 0.01694915 0.01694915 0.01666667 0.01666667 0.01754386 #> Panthera 0.01694915 0.01694915 0.01492537 0.01587302 0.01754386 #> Pteronura 0.01694915 0.01694915 0.01470588 0.01587302 0.01754386 #> Puma 0.01694915 0.01694915 0.01492537 0.01587302 0.01754386 #> Sotalia 0.01694915 0.01694915 0.01587302 0.01562500 0.01754386 #> Pontoporia 0.01694915 0.01694915 0.01587302 0.01562500 0.01754386 #> Trichechus 0.01666667 0.01666667 0.01694915 0.01694915 0.01754386 #> Mazama 0.01694915 0.01694915 0.01587302 0.01562500 0.01754386 #> Blastocerus 0.01694915 0.01694915 0.01587302 0.01562500 0.01754386 ``` `Didelphis` shows the largest distances to all other taxa (0.01754), consistent with the early divergence of marsupials from placental mammals. Carnivores (`Panthera`, `Pteronura`, `Puma`) show smaller distances among themselves, reflecting a more recent common ancestor within Carnivora. --- ## Phylogenetic tree with `ape` `taxo_cluster()` performs hierarchical clustering on the distance matrix and returns an object that `ape` can convert directly to a `phylo` tree for publication-quality visualisation. ```{r ape, fig.width = 7, fig.height = 5} library(ape) cl <- taxo_cluster(taxa_brazil) tree <- ape::as.phylo(cl$hclust) plot(tree, main = "Threatened Mammals of Brazil", cex = 0.85, tip.color = "gray20") ``` The `phylo` object is fully compatible with all `ape` tree manipulation and plotting functions, including `nodelabels()`, `edgelabels()`, and export to Newick format via `write.tree()`. ```{r ape-newick} ape::write.tree(tree, file = "taxa_brazil.nwk") ``` --- ## Taxonomic diversity with `vegan` ### Clarke and Warwick indices (`taxondive`) `vegan::taxondive()` computes a family of taxonomic diversity indices (Clarke and Warwick 1998, 2001) from a community matrix and a taxonomic distance matrix. These indices capture not just species richness but how distantly related the species in a community are: a community of closely related species scores lower than one with the same richness but spanning multiple orders. The function expects a community matrix (sites x species) and the `dist` object from `distance_matrix()`. Here we define three hypothetical communities, each dominated by one broad taxonomic group. ```{r taxondive} set.seed(123) library(vegan) comm <- matrix(c( 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # xenarthrans + marsupial 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, # primates + carnivores 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1 # cetaceans + sirenian + ungulates ), nrow = 3, byrow = TRUE) colnames(comm) <- taxa_brazil rownames(comm) <- c("community_A", "community_B", "community_C") vegan::taxondive(comm, mat) #> Species Delta Delta* Lambda+ Delta+ S Delta+ #> community_A 3.0000e+00 1.7160e-02 1.7160e-02 2.9410e-07 1.7160e-02 0.0515 #> community_B 5.0000e+00 1.5905e-02 1.5905e-02 8.8325e-07 1.5905e-02 0.0795 #> community_C 5.0000e+00 1.5750e-02 1.5750e-02 1.1834e-06 1.5750e-02 0.0788 #> Expected -9.9265e-02 1.6544e-02 1.6457e-02 ``` The key indices are: **Delta** ($\Delta$): mean taxonomic distance between all pairs of species in the community. **Delta*** ($\Delta^*$): mean pairwise distance excluding self-comparisons; comparable across communities of different sizes. **Delta+** ($\Delta^+$): mean pairwise distance weighted by species abundances. For presence-absence data this equals $\Delta^*$. **Lambda+** ($\Lambda^+$): variance in taxonomic distances, measuring how evenly a community spans the taxonomic tree. `community_A`, dominated by xenarthrans and a marsupial, scores the highest $\Delta$ (0.01673), reflecting deeper divergences among its members. `community_B` (primates and carnivores) scores the lowest (0.01591), as its members share more recent common ancestors within Placentalia. --- ## Mantel test with `vegan` The Mantel test assesses the correlation between two distance matrices by permutation. A typical application is testing whether taxonomic distance correlates with geographic distance, a signature of phylogeographic structure. Here we use simulated coordinates for illustration. In practice, replace `coords` with observed latitude and longitude values. ```{r mantel} set.seed(42) coords <- matrix(rnorm(30), ncol = 2) rownames(coords) <- taxa_brazil geo_dist <- dist(coords) vegan::mantel(mat, geo_dist) #> Mantel statistic based on Pearson's product-moment correlation #> #> Call: #> vegan::mantel(xdis = mat, ydis = geo_dist) #> #> Mantel statistic r: -0.0569 #> Significance: 0.653 #> #> Upper quantiles of permutations (null model): #> 90% 95% 97.5% 99% #> 0.188 0.237 0.272 0.336 #> Permutation: free #> Number of permutations: 999 ``` The non-significant result (r = -0.27, p = 0.972) is expected with random coordinates. For non-normal distance distributions, `"spearman"` is more robust than the default `"pearson"`. ```{r mantel-spearman} vegan::mantel(mat, geo_dist, method = "spearman", permutations = 9999) #> Mantel statistic based on Spearman's rank correlation rho #> #> Call: #> vegan::mantel(xdis = mat, ydis = geo_dist, method = "spearman", #> permutations = 9999) #> #> Mantel statistic r: -0.07405 #> Significance: 0.672 #> #> Upper quantiles of permutations (null model): #> 90% 95% 97.5% 99% #> 0.189 0.244 0.293 0.353 #> Permutation: free #> Number of permutations: 9999 ``` --- ## PERMANOVA with `vegan` PERMANOVA (Anderson 2001) tests whether groups of taxa differ significantly in taxonomic distance space. It partitions total variance in the distance matrix into within-group and between-group components and assesses significance by permutation, with no assumption of multivariate normality. ```{r permanova} groups <- c( "xenarthra", "xenarthra", "carnivore", "ungulate", "marsupial", "primate", "primate", "carnivore", "carnivore", "carnivore", "cetacean", "cetacean", "sirenian", "ungulate", "ungulate" ) vegan::adonis2(mat ~ groups) #> Permutation test for adonis under reduced model #> Permutation: free #> Number of permutations: 999 #> #> vegan::adonis2(formula = mat ~ groups) #> Df SumOfSqs R2 F Pr(>F) #> Model 6 0.00100049 0.52646 1.4823 0.001 *** #> Residual 8 0.00089992 0.47354 #> Total 14 0.00190041 1.00000 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` The taxonomic grouping explains $49.6\%$ of the total variance in the distance matrix ($\text{R}^2 = 0.496$, $\text{p} = 0.001$). This confirms that the broad mammalian orders occupy distinct regions of taxonomic distance space, consistent with their deep evolutionary divergences. For pairwise comparisons between groups after a significant PERMANOVA, see `pairwise.adonis2()` in the `pairwiseAdonis` package. --- ## A complete workflow The following script chains all steps from taxon names to multivariate analysis. ```{r workflow} library(taxodist) library(vegan) library(ape) # 1. define taxa taxa_brazil <- c( "Priodontes", "Myrmecophaga", "Chrysocyon", "Tapirus", "Didelphis", "Leontopithecus", "Brachyteles", "Panthera", "Pteronura", "Puma", "Sotalia", "Pontoporia", "Trichechus", "Mazama", "Blastocerus" ) # 2. compute distance matrix mat <- distance_matrix(taxa_brazil) # 3. hierarchical clustering and phylo plot cl <- taxo_cluster(taxa_brazil) tree <- ape::as.phylo(cl$hclust) plot(tree, main = "Threatened Mammals of Brazil", cex = 0.85) # 4. PCoA ordination ord <- taxo_ordinate(mat) plot(ord, main = "PCoA: Taxonomic Distance Space") # 5. PERMANOVA groups <- c( "xenarthra", "xenarthra", "carnivore", "ungulate", "marsupial", "primate", "primate", "carnivore", "carnivore", "carnivore", "cetacean", "cetacean", "sirenian", "ungulate", "ungulate" ) vegan::adonis2(mat ~ groups) ``` --- ## References Anderson, M.J. (2001). A new method for non-parametric multivariate analysis of variance. *Austral Ecology*, 26, 32--46. Clarke, K.R. and Warwick, R.M. (1998). A taxonomic distinctness index and its statistical properties. *Journal of Applied Ecology*, 35, 523--531. Clarke, K.R. and Warwick, R.M. (2001). A further biodiversity index applicable to species lists: variation in taxonomic distinctness. *Marine Ecology Progress Series*, 216, 265--278. Brands, S.J. (1989 onwards). *Systema Naturae 2000*. Amsterdam, The Netherlands. Retrieved from The Taxonomicon, .