Authors: Glen Satten (GSatten@emory.edu), Mo Li (mo.li@louisiana.edu), Ni Zhao (nzhao10@jhu.edu)
In this study, we introduce a novel statistical framework for differential abundance analysis of microbiome data, termed the Compositional Accelerated Failure Time (CAFT) model. The CAFT model addresses zero read counts by treating them as censored observations below the detection limit, similar to censoring mechanisms employed in survival analysis. This approach is inherently resistant to multiplicative bias, eliminates the need for pseudocounts, and addresses compositional bias through the establishment of appropriate score test procedures. For FDR control, we utilize and expand the idea from Efron’s empirical null distribution to achieve better FDR control.
You can install the version of CAFT from Github:
# install.packages("remotes")
remotes::install_github("mli171/CAFT", build_vignettes = TRUE, dependencies = TRUE)
browseVignettes("CAFT")
The main function in CAFT package is:
caft()
Apply ‘caft’ to a dataset from the study of gut microbiome data set focusing on the adult colorectal cancer using the stool samples (Pasolli et al.,2017).
library(CAFT)
library(phyloseq)
data(Colon)
count.tab = t(as.data.frame(as.matrix(otu_table(Colon))))
sample.tab = as.data.frame(as.matrix(sample_data(Colon)))
tax.tab = as.data.frame(as.matrix(tax_table(Colon)))
dim(count.tab)
pNA = which(is.na(sample.tab$age))
if(length(pNA) > 0){
count.tab = count.tab[-pNA, ]
sample.tab = sample.tab[-pNA,]
}
# No missing values from gender
## otu presence filtering
p_otu = which(rowSums(t(count.tab) > 0) > 1)
count.tab = count.tab[,p_otu]
tax.tab = tax.tab[p_otu,]
dim(count.tab)
cens.prop = colMeans(count.tab == 0, na.rm = T)
mean(cens.prop)
Disease1 = Disease2 = rep(0, NROW(sample.tab)) # healthy
Disease1[sample.tab$disease == "CRC"] = 1
Disease2[sample.tab$disease == "adenoma"] = 1
Age = as.numeric(sample.tab$age)
Gender = as.numeric(factor(sample.tab$gender)) - 1
x.test = cbind(Disease1, Disease2)
x.adj = cbind(Age, Gender)
res.CAFT = caft(otu.table=count.tab, x.test=x.test, x.adj=x.adj)
res.CAFT = caft(otu.table=count.tab, x.test=x.test, x.adj=x.adj, n.cores=4)
If you use CAFT in your work, please cite:
Satten, G. A., Li, M., & Zhao, N. (2025). CAFT: A Compositional Log-Linear Model for Microbiome Data with Zero Cells. bioRxiv, 2025.11.26.690468. https://doi.org/10.1101/2025.11.26.690468
BibTeX:
@article{satten2025caft,
title = {CAFT: A Compositional Log-Linear Model for Microbiome Data with Zero Cells},
author = {Satten, Glen A. and Li, Mo and Zhao, Ni},
journal = {bioRxiv},
year = {2025},
doi = {10.1101/2025.11.26.690468},
note = {Preprint}
}Pasolli E, Schiffer L, Manghi P, Renson A, Obenchain V, Truong D, Beghini F, Malik F, Ramos M, Dowd J, Huttenhower C, Morgan M, Segata N, Waldron L (2017). “Accessible, curated metagenomic data through ExperimentHub.” Nat. Methods, 14(11), 1023–1024. ISSN 1548-7091, 1548-7105, doi:10.1038/nmeth.4468.