This is a short, simplified version of the package tutorial. Only illustration and examples for the analysis of differential relative abundance using GAMLSS-BEZI and meta-analysis across studies using random effects models are shown. The metamicrobiomeR package includes the functions below. Full illustration, examples of all implemented functions, workflows and data are available at the package github repo.
Functions | Description |
---|---|
taxa.filter | Filter relative abundances of bacterial taxa or pathways using prevalence and abundance thresholds |
taxa.meansdn | Summarize mean, standard deviation of abundances and number of subjects by groups for all bacterial taxa or pathways |
taxa.mean.plot | Plot mean abundance by groups (from taxa.meansdn output) |
taxa.compare | Compare relative abundances of bacterial taxa at all levels using GAMLSS or linear/linear mixed effect models (LM) or linear/linear mixed effect models with arcsin squareroot transformation (LMAS) |
pathway.compare | Compare relative abundances of bacterial functional pathways at all levels using GAMLSS or LM or LMAS. Compare of log(absolute abundances) of bacterial functional pathways at all levels using LM |
taxcomtab.show | Display the results of relative abundance comparison (from taxa.compare or pathway.compare outputs) |
meta.taxa | Perform meta-analysis of relative abundance estimates of bacterial taxa or pathways (either from GAMLSS or LM or LMAS) across studies (from combined taxa.compare/pathway.compare outputs of all included studies) using random effect and fixed effect meta-analysis models |
metatab.show | Display meta-analysis results of bacterial taxa or pathway relative abundances (from meta.taxa output) |
meta.niceplot | Produce nice combined heatmap and forest plot for meta-analysis results of bacterial taxa and pathway relative abundances (from metatab.show output) |
read.multi | Read multiple files in a path to R |
alpha.compare | Calculate average alpha diversity indexes for a specific rarefaction depth, standardize and compare alpha diversity indexes between groups |
microbiomeage | Predict microbiome age using Random Forest model based on relative abundances of bacterial genera shared with the Bangladesh study |
data(taxtab6)
taxlist.rm<-taxa.filter(taxtab=taxtab6,percent.filter = 0.05, relabund.filter = 0.00005)
taxa.meansdn.rm<-taxa.meansdn(taxtab=taxtab6,sumvar="bf",groupvar="age.sample")
taxa.meansdn.rm<-taxa.meansdn.rm[taxa.meansdn.rm$bf!="No_BF",] #&taxa.meansdn.rm$age.sample<=6,
taxa.meansdn.rm$bf<-gdata::drop.levels(taxa.meansdn.rm$bf,reorder=FALSE)
#phylum
p.bf.l2<-taxa.mean.plot(tabmean=taxa.meansdn.rm,tax.lev="l2", comvar="bf", groupvar="age.sample",mean.filter=0.005, show.taxname="short")
p.bf.l2$p
# Note: running time is not long in regular laptop for both GAMLSS-BEZI analysis (~10s) and meta-analysis (~5s).
# However, to save running time, only taxonomies of one small phylum are selected for differential analysis example.
tab6<-as.data.frame(taxtab6)
tl<-colnames(taxtab6)[grep("k__bacteria.p__fusobacteria",colnames(taxtab6))]
taxacom.ex<-taxa.compare(taxtab=tab6[,c("personid","x.sampleid","bf","age.sample",tl)],propmed.rel="gamlss",comvar="bf",adjustvar="age.sample",
longitudinal="yes",p.adjust.method="fdr")
Show results:
#phylum
taxcomtab.show(taxcomtab=taxacom.ex,tax.select="none", showvar="bfNon_exclusiveBF", tax.lev="l2",readjust.p=TRUE,p.adjust.method="fdr",p.cutoff = 1)
#> id Estimate.bfNon_exclusiveBF ll ul
#> 1 k__bacteria.p__fusobacteria -0.06 -0.77 0.65
#> Pr(>|t|).bfNon_exclusiveBF pval.adjust.bfNon_exclusiveBF
#> 1 0.8696 0.8696
#genus
taxcomtab.show(taxcomtab=taxacom.ex,tax.select="none", showvar="bfNon_exclusiveBF", tax.lev="l6", readjust.p=TRUE,p.adjust.method="fdr",p.cutoff = 1)
#> id
#> 5 k__bacteria.p__fusobacteria.c__fusobacteriia.o__fusobacteriales.f__fusobacteriaceae.g__fusobacterium
#> Estimate.bfNon_exclusiveBF ll ul Pr(>|t|).bfNon_exclusiveBF
#> 5 -0.12 -0.95 0.72 0.7846
#> pval.adjust.bfNon_exclusiveBF
#> 5 0.7846
taxacom.lmas<-taxa.compare(taxtab=tab6[,c("personid","x.sampleid","bf","age.sample",tl)], propmed.rel="lm",transform="asin.sqrt",comvar="bf",adjustvar="age.sample", longitudinal="yes",p.adjust.method="fdr")
#phylum
taxcomtab.show(taxcomtab=taxacom.lmas,tax.select="none", showvar="bfNon_exclusiveBF", tax.lev="l2",readjust.p=TRUE,p.adjust.method="fdr",p.cutoff = 1, digit=5,p.digit=5)
#> id Estimate.bfNon_exclusiveBF ll ul
#> 1 k__bacteria.p__fusobacteria 0.00134 -0.00134 0.00402
#> Pr(>|t|).bfNon_exclusiveBF pval.adjust.bfNon_exclusiveBF
#> 1 0.32081 0.32081
#family
taxcomtab.show(taxcomtab=taxacom.lmas,tax.select="none", showvar="bfNon_exclusiveBF", tax.lev="l5",readjust.p=TRUE,p.adjust.method="fdr",p.cutoff = 1, digit=5,p.digit=5)
#> id
#> 4 k__bacteria.p__fusobacteria.c__fusobacteriia.o__fusobacteriales.f__fusobacteriaceae
#> Estimate.bfNon_exclusiveBF ll ul Pr(>|t|).bfNon_exclusiveBF
#> 4 0.00137 -0.00131 0.00405 0.31016
#> pval.adjust.bfNon_exclusiveBF
#> 4 0.31016
data(kegg.12)
data(covar.rm)
# Comparison of pathway relative abundances for level 1 only (to save running time)
path1<-pathway.compare(pathtab=list(kegg.12[[1]]),mapfile=covar.rm,sampleid="sampleid", pathsum="rel",stat.med="gamlss",comvar="gender",adjustvar=c("age.sample","bf"), longitudinal="yes",p.adjust.method="fdr",percent.filter=0.05,relabund.filter=0.00005)
Show results:
taxcomtab.show(taxcomtab=path1$l1, sumvar="path", tax.lev="l2",tax.select="none",showvar="genderMale", p.adjust.method="fdr",p.cutoff=1)
#> id Estimate.genderMale ll ul
#> 7 Organismal.Systems 0.04 0.02 0.06
#> 5 Metabolism 0.01 0.00 0.01
#> 4 Human.Diseases -0.02 -0.03 0.00
#> 8 Unclassified -0.01 -0.02 0.00
#> 2 Environmental.Information.Processing -0.01 -0.02 0.00
#> 3 Genetic.Information.Processing 0.01 0.00 0.01
#> 6 None -0.06 -0.16 0.05
#> 1 Cellular.Processes 0.00 -0.03 0.02
#> Pr(>|t|).genderMale pval.adjust.genderMale
#> 7 0.0000 0.0001
#> 5 0.0155 0.0619
#> 4 0.0273 0.0727
#> 8 0.0521 0.1043
#> 2 0.1705 0.2728
#> 3 0.2454 0.3271
#> 6 0.2877 0.3288
#> 1 0.8100 0.8100
GAMLSS-BEZI results from four studies (Bangladesh, Haiti, USA(CA_FL), USA(NC)) were used for analysis. Heatmap of log(odds ratio) (log(OR)) of relative abundances of gut bacterial taxa at different taxonomic levels between male vs. female infants for each study and pooled estimates (meta-analysis) across all studies with 95% confidence intervals (95% CI) (forest plot). All log(OR) estimates of each bacterial taxa from each study were from Generalized Additive Models for Location Scale and Shape (GAMLSS) with beta zero inflated family (BEZI) and were adjusted for feeding status and age of infants at sample collection. Pooled log(OR) estimates and 95% CI (forest plot) were from random effect meta-analysis models with inverse variance weighting and DerSimonian-Laird estimator for between-study variance based on the adjusted log(OR) estimates and corresponding standard errors of all included studies. Bacterial taxa with p-values for differential relative abundances <0.05 were denoted with * and those with p-values <0.0001 were denoted with **. Pooled log(OR) estimates with pooled p-values<0.05 are in red and those with false discovery rate (FDR) adjusted pooled p-values <0.1 are in triangle shape. Missing (unavailable) values are in white. USA: United States of America; CA: California; FL: Florida; NC: North Carolina.
# load saved GAMLSS-BEZI results of four studies for the comparison of bacterial taxa relative abundance between genders adjusted for breastfeeding and infant age at sample collection
data(tabsex4)
#select only taxonomies of a small phylum for meta-analysis example (to save running time)
tlm<-tabsex4$id[grep("k__bacteria.p__fusobacteria",tabsex4$id)]
# meta-analysis
metab.sex<-meta.taxa(taxcomdat=tabsex4[tabsex4$id %in% tlm,], summary.measure="RR", pool.var="id", studylab="study", backtransform=FALSE, percent.meta=0.5, p.adjust.method="fdr")
#show results by table and plot
#phylum
#table
metatab.show(metatab=metab.sex$random,com.pooled.tab=tabsex4[tabsex4$id %in% tlm,], tax.lev="l2",showvar="genderMale",p.cutoff.type="p", p.cutoff=1,display="table")
#> id estimate ll ul p p.adjust
#> 1 k__bacteria.p__fusobacteria 0.04 -0.28 0.35 0.8231 0.9242
#plot
metadat<-metatab.show(metatab=metab.sex$random,com.pooled.tab=tabsex4, tax.lev="l2",showvar="genderMale",p.cutoff.type="p", p.cutoff=1,display="data")
meta.niceplot(metadat=metadat,sumtype="taxa",level="main",p="p", p.adjust="p.adjust",phyla.col="rainbow",p.sig.heat="yes", heat.forest.width.ratio =c(1.5,1), leg.key.size=0.8, leg.text.size=10, heat.text.x.size=10, heat.text.x.angle=0, forest.axis.text.y=8,forest.axis.text.x=10, point.ratio = c(4,2),line.ratio = c(2,1))
Random effects meta-analysis models can also be generally applied to other microbiome measures such as microbial alpha diversity and microbiome age. To make the estimates for these positive continuous microbiome measures comparable across studies, these measures should be standardized to have a mean of 0 and standard deviation of 1 before between-group-comparison within each study. Random effects meta-analysis models can then be applied to pool the “comparable” estimates and their standard errors across studies. Meta-analysis results of these measures can be displayed as standard meta-analysis forest plots.
For each study, the alpha.compare function imports the outputs from “alpha_rarefaction.py” QIIME1 script and calculates mean alpha diversity for different indices for each sample based on a user defined rarefaction depth. Mean alpha diversity indexes are standardized to have a mean of 0 and standard deviation of 1 to make these measures comparable across studies. Standardized alpha diversity indexes are compared between groups adjusting for covariates using LM. Meta-analysis across studies is then done and the results are displayed as a standard meta-analysis forest plot.
data(alphadat)
data(covar.rm)
covar.rm$sampleid<-tolower(covar.rm$sampleid)
#comparison of standardized alpha diversity indexes between genders adjusting for breastfeeding and infant age at sample collection in infants <=6 months of age
alphacom<-alpha.compare(datlist=alphadat,depth=3,mapfile=covar.rm, mapsampleid="sampleid",comvar="gender",adjustvar=c("age.sample","bf"), longitudinal="yes",age.limit=6,standardize=TRUE)
alphacom$alphasum[,1:5]
#> id Estimate.genderMale Std. Error.genderMale t value.genderMale
#> 1 chao1 -0.06535198 0.1502276 -0.4350197
#> 2 observed_species -0.14283545 0.1363228 -1.0477741
#> 3 pd_whole_tree -0.13246866 0.1452980 -0.9117031
#> 4 shannon -0.44008064 0.1993104 -2.2080165
#> Pr(>|t|).genderMale
#> 1 0.66354814
#> 2 0.29474270
#> 3 0.36192504
#> 4 0.02724312
The results showed that alpha diversity (four commonly used indexes Shannon, Phylogenetic diversity whole tree, Observed species, Chao1) was not different between male and female infants <=6 months of age in the meta-analysis of the four included studies.
# load saved results of 4 studies
data(asum4)
asum4[,c(colnames(asum4)[1:5],"pop")]
#> id Estimate.genderMale Std. Error.genderMale
#> 1 chao1 0.065272209 0.07218733
#> 2 observed_species 0.065294129 0.06212808
#> 3 pd_whole_tree 0.031315730 0.05024410
#> 4 shannon -0.001275711 0.08208238
#> 5 chao1 -0.088622253 0.35281030
#> 6 observed_species -0.091470047 0.34530046
#> 7 pd_whole_tree 0.042568336 0.29523720
#> 8 shannon -0.009111759 0.29356564
#> 9 chao1 -0.045032383 0.13370075
#> 10 observed_species -0.065381830 0.11449459
#> 11 pd_whole_tree -0.159019660 0.10759336
#> 12 shannon -0.070534615 0.14337124
#> 13 chao1 0.496575022 0.59246819
#> 14 observed_species 0.233671733 0.52941632
#> 15 pd_whole_tree -0.010762231 0.73314245
#> 16 shannon 0.178306026 0.55527792
#> t value.genderMale Pr(>|t|).genderMale pop
#> 1 0.90420590 0.3658862 Bangladesh
#> 2 1.05095998 0.2932770 Bangladesh
#> 3 0.62327174 0.5331060 Bangladesh
#> 4 -0.01554184 0.9875999 Bangladesh
#> 5 -0.25118953 0.8029223 Haiti
#> 6 -0.26489987 0.7924139 Haiti
#> 7 0.14418351 0.8860620 Haiti
#> 8 -0.03103823 0.9753897 Haiti
#> 9 -0.33681473 0.7362566 USA(CA_FL)
#> 10 -0.57104735 0.5679675 USA(CA_FL)
#> 11 -1.47796914 0.1394160 USA(CA_FL)
#> 12 -0.49197185 0.6227392 USA(CA_FL)
#> 13 0.83814630 0.4019485 USA(UNC)
#> 14 0.44137615 0.6589407 USA(UNC)
#> 15 -0.01467959 0.9882878 USA(UNC)
#> 16 0.32111132 0.7481260 USA(UNC)
#Shannon index
shannon.sex <- meta::metagen(Estimate.genderMale, `Std. Error.genderMale`, studlab=pop,data=subset(asum4,id=="shannon"),sm="RD", backtransf=FALSE)
meta::forest(shannon.sex,smlab="Standardized \n diversity difference",sortvar=subset(asum4,id=="shannon")$pop,lwd=2)
shannon.sex
#> RD 95%-CI %W(fixed) %W(random)
#> Bangladesh -0.0013 [-0.1622; 0.1596] 70.0 70.0
#> Haiti -0.0091 [-0.5845; 0.5663] 5.5 5.5
#> USA(CA_FL) -0.0705 [-0.3515; 0.2105] 23.0 23.0
#> USA(UNC) 0.1783 [-0.9100; 1.2666] 1.5 1.5
#>
#> Number of studies combined: k = 4
#>
#> RD 95%-CI z p-value
#> Fixed effect model -0.0149 [-0.1495; 0.1198] -0.22 0.8288
#> Random effects model -0.0149 [-0.1495; 0.1198] -0.22 0.8288
#>
#> Quantifying heterogeneity:
#> tau^2 = 0 [0.0000; 0.0244]; tau = 0 [0.0000; 0.1561];
#> I^2 = 0.0% [0.0%; 0.0%]; H = 1.00 [1.00; 1.00]
#>
#> Test of heterogeneity:
#> Q d.f. p-value
#> 0.30 3 0.9601
#>
#> Details on meta-analytical method:
#> - Inverse variance method
#> - DerSimonian-Laird estimator for tau^2
#> - Jackson method for confidence interval of tau^2 and tau
cbind(study=shannon.sex$studlab,pval=shannon.sex$pval)
#> study pval
#> [1,] "Bangladesh" "0.987599902164408"
#> [2,] "Haiti" "0.9752390484969"
#> [3,] "USA(CA_FL)" "0.622739246734807"
#> [4,] "USA(UNC)" "0.748126030769876"
Random Forest (RF) modeling of gut microbiota maturity has been used to characterize development of the microbiome over chronological time. Adapting from the original approach of Subramanian et al, in the microbiomeage function, relative abundances of bacterial genera that were detected in the Bangladesh data and in the data of other studies to be included were regressed against infant chronological age using a RF model on a predefined training dataset of the Bangladesh study. This predefined training set includes 249 samples collected monthly from birth to 2 years of age from 11 Bangladeshi healthy singleton infants. The RF training model fit based on relative abundances of these shared bacterial genera was then used to predict infant age on the test data of the Bangladesh study and the data of each other study to be included. The predicted infant age based on relative abundances of these shared bacterial genera in each study is referred to as gut microbiota age.
In brief, the microbiomeage function get the shared genera list between the Bangladesh study and all other included studies, get the training and test sets from Bangladesh data based on the shared genera list, fit the train Random Forest model and predict microbiome age in the test set of Bangladesh data and data from all included studies, check for performance of the model based on the shared genera list on Bangladesh healthy cohort data, reproduce the findings of the Bangladesh malnutrition study.
As the data is large and the model takes time to run, please go to the package github repo for example codes and data.
All more detailed source code, example data, documentation and the manuscript describing the metamicrobiomeR package are available at [https://github.com/nhanhocu/metamicrobiomeR].