--- title: "Ozkan pTO Method: Deng Entropy-Based Taxonomic Diversity" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Ozkan pTO Method: Deng Entropy-Based Taxonomic Diversity} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 5 ) ``` ## What Is the Ozkan pTO Method? Ozkan (2018) introduced a novel approach to measuring taxonomic diversity using **Deng entropy** --- a generalization of Shannon entropy rooted in Dempster-Shafer evidence theory (Dempster, 1967; Shafer, 1976). The key idea: at each level of the taxonomic hierarchy (genus, family, order, etc.), **Deng entropy** measures how evenly species are distributed across groups. The product of these level-wise entropies gives a single number that captures the **entire hierarchical diversity** of a community. This approach produces **8 complementary indices** through a three-stage pipeline, each answering a slightly different question about the community. ```{r setup} library(taxdiv) community <- c( Quercus_coccifera = 25, Quercus_infectoria = 18, Pinus_brutia = 30, Pinus_nigra = 12, Juniperus_excelsa = 8, Juniperus_oxycedrus = 6, Arbutus_andrachne = 15, Styrax_officinalis = 4, Cercis_siliquastrum = 3, Olea_europaea = 10 ) tax_tree <- build_tax_tree( species = names(community), Genus = c("Quercus", "Quercus", "Pinus", "Pinus", "Juniperus", "Juniperus", "Arbutus", "Styrax", "Cercis", "Olea"), Family = c("Fagaceae", "Fagaceae", "Pinaceae", "Pinaceae", "Cupressaceae", "Cupressaceae", "Ericaceae", "Styracaceae", "Fabaceae", "Oleaceae"), Order = c("Fagales", "Fagales", "Pinales", "Pinales", "Pinales", "Pinales", "Ericales", "Ericales", "Fabales", "Lamiales") ) ``` ## From Shannon to Deng: Why a New Entropy? Shannon entropy treats each species as an independent event with probability $p_i$. But in a taxonomic hierarchy, species are grouped --- two oak species share more information than an oak and a pine. Shannon cannot capture this grouping. **Deng entropy** solves this through the concept of **focal elements** from evidence theory. At each taxonomic level, a group (e.g., "Family Fagaceae") acts as a focal element with a mass proportional to the species it contains. The entropy accounts for both the mass distribution and the **size of each focal element** (how many species it contains): $$E_d = -\sum_{i=1}^{n} m(F_i) \log_2 \frac{m(F_i)}{2^{|F_i|} - 1}$$ where $m(F_i)$ is the mass of focal element $F_i$ and $|F_i|$ is the number of species it contains. The term $2^{|F_i|} - 1$ accounts for all possible non-empty subsets of species within the group. A genus with 3 species has $2^3 - 1 = 7$ possible subcombinations, giving it more "evidential weight" than a single-species genus. ## Deng Entropy at Each Taxonomic Level ```{r deng_levels} result <- ozkan_pto(community, tax_tree) cat("Deng entropy by taxonomic level:\n\n") for (i in seq_along(result$Ed_levels)) { level <- names(result$Ed_levels)[i] value <- result$Ed_levels[i] cat(sprintf(" %-10s Ed = %.4f\n", level, value)) } ``` **How to interpret:** - **Species level**: Equals Shannon entropy when all species are equally weighted (special case where each focal element has size 1) - **Genus level**: High when species are spread across many genera. Low when most species share one genus. - **Family level**: High when genera span many families. Low when the community is taxonomically narrow at the family level. - **Order level**: Similar pattern at the highest taxonomic rank. A level with Deng entropy = 0 means **all species belong to a single group** at that level --- it contributes no taxonomic information. ## The 8 Indices Explained The Ozkan method produces 8 values organized in a 2 x 2 x 2 structure: ### Weighted vs Unweighted - **Unweighted (u)**: Each taxonomic level contributes equally to the product - **Weighted**: Higher taxonomic levels receive more weight (because resolving diversity at the order level is "more valuable" than at the genus level) ### With vs Without Species-Level Shannon - **pTO**: Product of Deng entropies across taxonomic levels only (genus, family, order) --- pure taxonomic structure - **pTO+**: Same product, but also includes the species-level Shannon entropy --- captures both abundance evenness and taxonomic structure ### All Levels vs Max-Informative Levels - **Standard**: Uses all taxonomic levels - **Max variants**: Uses only levels where Deng entropy > 0 (drops uninformative levels) ```{r eight_indices} cat("=== All 8 Ozkan pTO indices ===\n\n") cat("Standard (all levels):\n") cat(" uTO =", round(result$uTO, 4), " (unweighted diversity)\n") cat(" TO =", round(result$TO, 4), " (weighted diversity)\n") cat(" uTO+ =", round(result$uTO_plus, 4), " (unweighted distance)\n") cat(" TO+ =", round(result$TO_plus, 4), " (weighted distance)\n\n") cat("Max-informative levels:\n") cat(" uTO_max =", round(result$uTO_max, 4), " (unweighted, informative only)\n") cat(" TO_max =", round(result$TO_max, 4), " (weighted, informative only)\n") cat(" uTO+_max =", round(result$uTO_plus_max, 4), " (unweighted distance, informative only)\n") cat(" TO+_max =", round(result$TO_plus_max, 4), " (weighted distance, informative only)\n") ``` ### Which index to use? | Question | Index | |----------|-------| | Pure taxonomic structure (no abundance) | **uTO** or **TO** | | Taxonomic diversity + abundance evenness | **uTO+** or **TO+** | | Are some taxonomic levels uninformative? | Use **_max** variants | | Default recommendation for most studies | **TO+** (most complete) | ## The Three-Run Pipeline ### Run 1: Deterministic Calculation Uses the full community as-is. Computes all 8 indices directly. ```{r run1} cat("Run 1 results:\n") cat(" uTO+ =", round(result$uTO_plus, 4), "\n") cat(" TO+ =", round(result$TO_plus, 4), "\n") ``` ### Run 2: Stochastic Resampling (Slicing) Species are removed one at a time, starting with the least abundant. After each removal, all indices are recalculated. This "slicing" procedure reveals two things: 1. **The maximum diversity achievable** from the community's species pool 2. **Each species' contribution** to overall diversity ```{r run2} run2 <- ozkan_pto_resample(community, tax_tree, n_iter = 101, seed = 42) cat("Run 1 (deterministic): uTO+ =", round(run2$uTO_plus_det, 4), "\n") cat("Run 2 (stochastic max): uTO+ =", round(run2$uTO_plus_max, 4), "\n") ``` **Why does maximum > deterministic?** Because some species may be taxonomically redundant. If two species from the same genus are present, removing one can increase the ratio of between-group to within-group diversity. The species whose removal *increases* diversity is called an "unhappy" species --- it is taxonomically redundant in the community. ### Visualizing Run 2 ```{r run2_plot, fig.width=9, fig.height=5, fig.alt="Iteration plot showing TO+ values across stochastic resampling iterations"} plot_iteration(run2, component = "TO_plus", title = "Run 2: TO+ Across Iterations") ``` **How to read:** - **Grey dots**: pTO value for each random species subset - **Red line**: Deterministic value (Run 1 --- all species included) - **Blue line**: Maximum value found (Run 2 result) Points above the red line represent subcommunities more diverse than the full community --- evidence that some species are taxonomically redundant. ### Run 3: Max-Informative Level Variants Some taxonomic levels carry no information. If all species belong to the same order, Deng entropy at the order level is zero --- including it in the product just drags the value down without adding insight. Run 3 repeats the calculation using only levels where Deng entropy > 0: ```{r run3} run3 <- ozkan_pto_sensitivity(community, tax_tree, run2, seed = 123) cat("All levels: TO+ =", round(run3$TO_plus_max, 4), "\n") cat("Informative only: TO+ =", round(result$TO_plus_max, 4), "\n") ``` ### Full Pipeline in One Call ```{r full} full <- ozkan_pto_full(community, tax_tree, n_iter = 101, seed = 42) cat("Complete pipeline summary:\n\n") cat(" uTO+ TO+ uTO TO\n") cat("Run 1:", sprintf("%9.4f %9.4f %9.4f %9.4f", full$run1$uTO_plus, full$run1$TO_plus, full$run1$uTO, full$run1$TO), "\n") cat("Run 2:", sprintf("%9.4f %9.4f %9.4f %9.4f", full$run2$uTO_plus_max, full$run2$TO_plus_max, full$run2$uTO_max, full$run2$TO_max), "\n") cat("Run 3:", sprintf("%9.4f %9.4f %9.4f %9.4f", full$run3$uTO_plus_max, full$run3$TO_plus_max, full$run3$uTO_max, full$run3$TO_max), "\n") ``` ## Jackknife Leave-One-Out Analysis The jackknife procedure removes each species one at a time and recalculates all indices. This directly measures each species' contribution: ```{r jackknife} jk <- ozkan_pto_jackknife(community, tax_tree) cat("Jackknife results (TO+ when each species is removed):\n\n") jk_df <- jk$jackknife_results for (i in seq_len(nrow(jk_df))) { direction <- ifelse(jk_df$TO_plus[i] > result$TO_plus, "UNHAPPY", "happy") cat(sprintf(" Remove %-25s -> TO+ = %.4f [%s]\n", jk_df$species[i], jk_df$TO_plus[i], direction)) } cat("\nHappy species:", jk$n_happy, "\n") cat("Unhappy species:", jk$n_unhappy, "\n") ``` - **happy species**: Removing them *decreases* diversity (they contribute positively to taxonomic structure) - **UNHAPPY species**: Removing them *increases* diversity (they are taxonomically redundant) ## Comparing Communities ```{r compare, fig.width=8, fig.height=8, fig.alt="Radar chart comparing diversity indices between diverse and degraded communities"} degraded <- c( Quercus_coccifera = 40, Pinus_brutia = 35, Juniperus_oxycedrus = 10 ) communities <- list( "Intact (10 spp)" = community, "Degraded (3 spp)" = degraded ) plot_radar(communities, tax_tree, title = "Intact vs Degraded Forest") ``` The radar chart reveals which diversity dimensions are most affected by degradation. If abundance-weighted indices (Shannon, Simpson, TO+) drop more than presence/absence indices (AvTD, uTO+), the community has lost evenness. If both drop equally, the community has lost taxonomic breadth. ## References - Ozkan, K. (2018). A new proposed measure for estimating taxonomic diversity. *Turkish Journal of Forestry*, 19(4), 336-346. - Deng, Y. (2016). Deng entropy. *Chaos, Solitons & Fractals*, 91, 549-553. - Dempster, A.P. (1967). Upper and lower probabilities induced by a multivalued mapping. *The Annals of Mathematical Statistics*, 38(2), 325-339. - Shafer, G. (1976). *A Mathematical Theory of Evidence*. Princeton University Press, Princeton, NJ.