Help for package strollur

Type:

Package

Title:

Store and Transfer Amplicon Sequence Data

Version:

0.1.2

Date:

2026-06-09

Maintainer:

Pat Schloss <pschloss@umich.edu>

Description:

Stores the data associated with your amplicon sequence analysis. This includes nucleotide sequences, abundance, sample and treatment assignments, taxonomic classifications, asv, otu and phylotype clusters, metadata, trees and various reports. It is designed to facilitate data analysis across multiple R packages with utility functions to read / write from 'mothur', 'qiime2', 'dada2', and 'phyloseq'.

URL:

https://github.com/mothur/strollur, https://mothur.org/strollur/

BugReports:

https://github.com/mothur/strollur/issues

License:

GPL (≥ 3)

Imports:

Rcpp, cli, methods, microseq, R.utils, R6, waldo, readr, ape, dplyr, tidyr, yaml, rbiom (≥ 3.1.0), stats, utils

LinkingTo:

Rcpp, cli, Rcereal

Depends:

R (≥ 4.5.0)

Suggests:

knitr, rmarkdown, testthat (≥ 3.0.0), xml2, phyloseq, ggplot2, phylotypr, rhdf5, h5lite, pak

Config/testthat/edition:

Encoding:

UTF-8

VignetteBuilder:

knitr

Config/roxygen2/version:

8.0.0

RoxygenNote:

8.0.0

NeedsCompilation:

yes

Packaged:

2026-06-19 14:36:46 UTC; swestcot

Author:

Sarah Westcott

[aut], Gregory Johnson

[aut], Pat Schloss

[cph, cre]

Repository:

CRAN

Date/Publication:

2026-06-24 09:00:02 UTC

strollur: Store and Transfer Amplicon Sequence Data

Description

Author(s)

Maintainer: Pat Schloss pschloss@umich.edu (ORCID) [copyright holder]

Authors:

Sarah Westcott swestcot@umich.edu (ORCID)
Gregory Johnson grejoh@umich.edu (ORCID)

Get the abundance data for sequences, bins, samples, and treatments in a strollur object

Description

Get the abundance data for sequences, bins, samples, and treatments in a strollur object

Usage

abundance(data, type = "sequence", bin_type = "otu", by_sample = FALSE)

Arguments

data

a strollur object

type

string containing the type of data you want the number of. Options include: "sequence", "bin", "sample" and "treatment". Default = "sequence".

bin_type

string containing the bin type you would like the abundance data for. Default = "otu".

by_sample

Boolean. When by_sample is TRUE, the abundance data will be parsed by sample. Default = FALSE.

Value

data.frame

Examples


miseq <- miseq_sop_example()

# To the total abundance for each sequence
abundance(data = miseq, type = "sequence")

# To the total abundance for each sequence parsed by sample
abundance(data = miseq, type = "sequence", by_sample = TRUE)

# To the total abundance for each "otu" bin
abundance(data = miseq, type = "bin", bin_type = "otu")

# To the total abundance for each "otu" bin parsed by sample
abundance(data = miseq, type = "bin", bin_type = "otu", by_sample = TRUE)

# To the total abundance for each "asv" bin
abundance(data = miseq, type = "bin", bin_type = "asv")

# To the total abundance for each "asv" bin parsed by sample
abundance(data = miseq, type = "bin", bin_type = "asv", by_sample = TRUE)

# To the total abundance for each sample
abundance(data = miseq, type = "sample")

# To the total abundance for each treatment
abundance(data = miseq, type = "treatment")

Add sequences, reports, metadata or resource references to a strollur object

Description

Add sequences, reports, metadata or resource references to a strollur object

Usage

add(
  data,
  table,
  type = "sequence",
  report_type = NULL,
  table_names = list(sequence_name = "sequence_name", sequence = "sequence", comment =
    "comment", reference_vendor = "vendor", reference_name = "name", reference_version =
    "version", reference_usage = "usage", reference_note = "note", reference_method_url =
    "method_url", reference_documentation_url = "documentation_url", reference_parameter
    = "parameter", reference_citation = "citation"),
  reference = NULL,
  verbose = TRUE
)

Arguments

data

a strollur object

table

a data.frame containing the data you wish to add.

type

a string containing the type of data. Options include: 'sequence', 'resource_reference' 'metadata' and 'report'.

report_type

a string containing the type of report you are adding. Options include: 'metadata' and custom reports.

table_names

named list used to indicate the names of the columns in the table. By default:

table_names <- list(sequence_name = "sequence_name", comment = "comment", sequence = "sequence", reference_name = "name", reference_vendor = "vendor", reference_version = "version", reference_usage = "usage", reference_note = "note", reference_documentation_url = "documentation_url", reference_method_url = "method_url", reference_parameter = "parameter", reference_citation = "citation")

In table_names, 'sequence_name' is a string containing the name of the column in 'table' that contains the sequence names. It is used when you are adding FASTA data. Default column name is 'sequence_name'.

In table_names, 'sequence' is a string containing the name of the column in 'table' that contains the sequence nucleotide strings. It is used when you are adding FASTA data. Default column name is 'sequence'.

In table_names, 'comment' is a string containing the name of the column in 'table' that contains the sequence comments. It is used when you are adding FASTA data. Default column name is 'comment'.

In table_names, 'reference_vendor' is a string containing the name of the column in 'table' that contains the reference vendor names. It is used when you are adding reference data. Default column name is 'vendor'. In table_names, 'reference_name' is a string containing the name of the column in 'table' that contains the reference names. It is used when you are adding reference data. Default column name is 'name'.

In table_names, 'reference_version' is a string containing the name of the column in 'table' that contains the reference versions. Default column name is 'version'.

In table_names, 'reference_usage' is a string containing the name of the column in 'table' that contains the reference usages. Default column name is 'usage'.

In table_names, 'reference_note' is a string containing the name of the column in 'table' that contains the reference notes. Default column name is 'note'.

In table_names, 'reference_method_url' is a string containing the name of the column in 'table' that contains the reference method urls. Default column name is 'method_url'.

In table_names, 'reference_documentation_url' is a string containing the name of the column in 'table' that contains the reference urls. Default column name is 'documentation_url'.

In table_names, 'reference_parameter' is a string containing the name of the column in 'table' that contains the reference parameters. Default column name is 'parameter'.

In table_names, 'reference_citation' is a string containing the name of the column in 'table' that contains the reference citations. Default column name is 'citation'.

reference

a list created by the function [new_reference]. Optional.

verbose

boolean indicating whether or not you want progress messages. Default = TRUE.

Value

an updated strollur object

Examples


# Create a new empty strollur object named 'example_dataset'
data <- new_dataset(dataset_name = "example_dataset")

# Read FASTA data into data.frame
fasta_data <- read_fasta(fasta = strollur_example("final.fasta.gz"))

# Add FASTA sequence data
add(data = data, table = fasta_data, type = "sequence")

# To add FASTA data with a resource reference

# Create a new empty strollur object named 'example_dataset'
data <- new_dataset(dataset_name = "example_dataset")

# Create a resource reference for the FASTA data silva_resource <-
silva_resource <- new_reference(
  vendor = "SILVA", name =
    "silva.bacteria.fasta", version = "1.38.1",
  usage = "alignment of sequences",
  note = "reference trimmed to V4 region", method_url =
    "https://mothur.org/blog/2024/SILVA-v138_2-reference-files/",
  documentation_url = "https://mothur.org/wiki/silva_reference_files/"
)

# Add FASTA data with a resource reference

add(
  data,
  table = fasta_data,
  type = "sequence",
  reference = silva_resource
)

# Add contigs assembly report with a 'sequence_name' column named 'Name'

contigs_report <- readRDS(strollur_example("miseq_contigs_report.rds"))

add(
  data,
  table = contigs_report, type = "report",
  report_type = "contigs_report", list(sequence_name = "Name")
)

# To add metadata related to your study

metadata <- readRDS(strollur_example("miseq_metadata.rds"))

add(data, table = metadata, type = "metadata")

Assign sequence abundances, sequence classifications, bins, bin representative sequences, bin classifications or treatments to a strollur object

Description

Assign sequence abundances, sequence classifications, bins, bin representative sequences, bin classifications or treatments to a strollur object

Usage

assign(
  data,
  table,
  type = "bin",
  bin_type = "otu",
  table_names = list(sequence_name = "sequence_name", abundance = "abundance", sample =
    "sample", treatment = "treatment", taxonomy = "taxonomy", bin_name = "bin_name"),
  reference = NULL,
  verbose = TRUE
)

Arguments

data

a strollur object

table

a data.frame containing the data you wish to assign

type

a string containing the type of data. Options include: 'sequence_abundance', 'sequence_taxonomy', 'bin', 'bin_representative', 'bin_taxonomy' and 'treatment'. Default = "bin".

bin_type

string containing the bin type you would like the number of bins for. Default = "otu".

table_names

named list used to indicate the names of the columns in the table. By default:

table_names <- list(sequence_name = "sequence_name", abundance = "abundance", sample = "sample", treatment = "treatment", taxonomy = "taxonomy", bin_name = "bin_name")

In table_names, 'sequence_name' is a string containing the name of the column in 'table' that contains the sequence names. Default column name is 'sequence_name'.

In table_names, 'abundance' is a string containing the name of the column in 'table' that contains the abundances. Default column name is 'abundance'.

In table_names, 'sample' is a string containing the name of the column in 'table' that contains the samples. Default column name is 'sample'.

In table_names, 'treatment' is a string containing the name of the column in 'table' that contains the treatment names. Default column name is 'treatment'.

In table_names, 'taxonomy' is a string containing the name of the column in 'table' that contains the classifications. Default column name is 'taxonomy'.

In table_names, 'bin_name' is a string containing the name of the column in 'table' that contains the bin names. Default column name is 'bin_name'.

reference

a list created by the function [new_reference]. Optional.

verbose

boolean indicating whether or not you want progress messages. Default = TRUE.

Value

an updated strollur object

Examples


# Assign sequence classifications

# create a new empty strollur object named 'example_dataset'
data <- new_dataset(dataset_name = "example_dataset")

sequence_classifications <- read_mothur_taxonomy(strollur_example(
  "final.taxonomy.gz"
))

assign(
  data,
  table = sequence_classifications, type = "sequence_taxonomy"
)

# Assigning bins

# read mothur's otu list file into data.frame
otu_data <- read_mothur_list(list = strollur_example(
  "final.opti_mcc.list.gz"
))

# read mothur's asv list file into data.frame
asv_data <- read_mothur_list(list = strollur_example(
  "final.asv.list.gz"
))

# read mothur's phylotype list file into data.frame
phylo_data <- read_mothur_list(list = strollur_example(
  "final.tx.list.gz"
))

# read otu bin representative sequences into a data.frame
bin_reps <- readRDS(strollur_example("miseq_representative_sequences.rds"))

# assign 'otu' bins using sequence names
assign(data, table = otu_data, bin_type = "otu")

# assign 'asv' bins using sequence names
assign(data, table = asv_data, bin_type = "asv")

# assign 'phylotype' bins using sequence names
assign(data, table = phylo_data, bin_type = "phylotype")

# assign 'otu' bin representative sequences
assign(data, table = bin_reps, type = "bin_representative")

# To assign abundance only bins

# create a new empty strollur object named 'example_dataset'
data <- new_dataset(dataset_name = "example_dataset")

# read mothur's shared file
otu_data <- read_mothur_shared(strollur_example("final.opti_mcc.shared"))

# assign abundance only otus parsed by sample
assign(data, table = otu_data, bin_type = "otu")

# Assigning bin classifications

# read bin taxonomies
otu_data <- read_mothur_cons_taxonomy(strollur_example(
  "final.cons.taxonomy"
))

# assign otu consensus taxonomies
assign(
  data,
  table = otu_data,
  type = "bin_taxonomy", bin_type = "otu"
)

# Assign treatments

sample_assignments <- readRDS(strollur_example("miseq_sample_design.rds"))

assign(data, table = sample_assignments, type = "treatment")

clear

Description

Clear data from a strollur object

Usage

clear(data)

Arguments

data

a strollur object

Value

an updated strollur object

Examples


data <- miseq_sop_example()
clear(data)

copy_dataset

Description

Create a new strollur object from an existing dataset.

Usage

copy_dataset(data)

Arguments

data

a strollur object

Value

a strollur object

Examples


miseq <- miseq_sop_example()

# to create a new dataset that is a copy of miseq

data <- copy_dataset(miseq)

Find the number of sequences, samples, treatments or bins of a given type in a strollur object

Description

Find the number of sequences, samples, treatments or bins of a given type in a strollur object

Usage

count(
  data,
  type = "sequence",
  bin_type = "otu",
  samples = NULL,
  distinct = FALSE
)

Arguments

data

a strollur object

type

string containing the type of data you want the number of. Options include: "sequence", "sample", "treatment", "bin", and "resource_reference". Default = "sequence".

bin_type

string containing the bin type you would like the number of bins for. Default = "otu".

samples

vector of strings. samples is only used when 'type' = "sequence" or 'type' = "bin" . samples should contain the names of the samples you want the count for. Default = NULL.

distinct

Boolean. distinct is used when 'type' = "sequence" or 'type' = "bin". When 'type' = "sequence" and distinct is TRUE the number of unique sequences is returned. When 'type' = "sequence" and distinct is FALSE the total number of sequences is returned. This can also be combined with samples to find the number of unique sequences found ONLY in a given set of samples, or to find the number of unique sequences in given set of samples that may also be present in other samples. When 'type' = "bin", you can set distinct = TRUE to return the number of bins that ONLY contain sequences from the given samples. When distinct is FALSE the count returned contains bins with sequences from a given samples, but those bins may also contain other samples. Default = FALSE.

Value

double

Examples


miseq <- miseq_sop_example()

# To get the total number of sequences
count(data = miseq, type = "sequence")

# To get number of unique sequences
count(data = miseq, type = "sequence", distinct = TRUE)

# To get number of unique sequences from samples 'F3D0' and 'F3D1'
# Note these sequences will be present in both samples but may be
# be present in other samples as well
count(data = miseq, type = "sequence", samples = c("F3D0", "F3D1"))

# To get number of unique sequences exclusive to samples 'F3D0' and 'F3D1'
# Note sequences are present in both samples and NOT present in any other
# samples.
count(
  data = miseq, type = "sequence", samples = c("F3D0", "F3D1"),
  distinct = TRUE
)

# To get the number of samples in the dataset
count(data = miseq, type = "sample")

# To get the number of treatments in the dataset
count(data = miseq, type = "treatment")

# To get the number of "otu" bins in the dataset
count(data = miseq, type = "bin", bin_type = "otu")

# To get the number of "asv" bins in the dataset
count(data = miseq, type = "bin", bin_type = "asv")

# To get the number of "phylotype" bins in the dataset
count(data = miseq, type = "bin", bin_type = "phylotype")

# To get number of "otu" bins from samples 'F3D0' and 'F3D1'
# Note these bins will have sequences from both samples but there may be
# other samples present as well
count(
  data = miseq,
  type = "bin", bin_type = "otu", samples = c("F3D0", "F3D1")
)

# To get number of "otu" bins unique to samples 'F3D0' and 'F3D1'
# Note these bins will have sequences from both samples and NO other samples
# will be present in the bins.
count(
  data = miseq, type = "bin", bin_type = "otu",
  samples = c("F3D0", "F3D1"), distinct = TRUE
)

export_dataset

Description

Export all data from a strollur object.

Usage

export_dataset(data)

Arguments

data

a strollur object

Value

Rcpp::List, containing the data in the 'Dataset

Examples


dataset <- new_dataset("my_dataset")
export_dataset(dataset)

get_bin_types

Description

Get bin table types of a strollur object

Usage

get_bin_types(data)

Arguments

data

a strollur object

Value

vector of strings

Examples


data <- miseq_sop_example()
get_bin_types(data)

has_sample

Description

Determine if a given sample is in a strollur object

Usage

has_sample(data, sample)

Arguments

data

a strollur object.

sample

a string containing the name of a sample.

Value

boolean indicating whether the dataset has a given sample

Examples


data <- miseq_sop_example()
has_sample(data, "F3D0")
has_sample(data, "not a valid sample")

has_sequence_strings

Description

Determine if a strollur object contains sequence nucleotide strings.

Usage

has_sequence_strings(data)

Arguments

data

a strollur object.

Value

boolean indicating whether the dataset has sequence nucleotide strings.

Examples


data <- miseq_sop_example()
has_sequence_strings(data)

Import strollur object from exported data.frame.

Description

The import_dataset function will create a strollur object from the exported table of a strollur object.

Usage

import_dataset(table)

Arguments

table

a table containing the data from a strollur object. You can create the table using 'export(data)'.

Value

a strollur object

Examples


miseq <- miseq_sop_example()
data <- import_dataset(export_dataset(miseq))
data

is_aligned

Description

Determine if a strollur object contains aligned sequences.

Usage

is_aligned(data)

Arguments

data

a strollur object

Value

Boolean

Examples


dataset <- miseq_sop_example()
is_aligned(dataset)

is_equal

Description

Determine if two strollur objects are equal.

Usage

is_equal(data, data2)

Arguments

data

a strollur object

data2

a strollur object

Value

a logical

Examples


miseq <- miseq_sop_example()

data <- copy_dataset(miseq)

is_equal(miseq, data)

Load strollur object from .rds file

Description

The load_dataset function will create a strollur object from an RDS file.

Usage

load_dataset(file)

Arguments

file

a string containing the .rds file name.

Value

a strollur object

Examples


data <- load_dataset(strollur_example("miseq_sop.rds"))
data

Example strollur object

Description

The miseq_sop_example function will create 'strollur' object using the analysis files from the MiSeq_SOP example.

Usage

miseq_sop_example()

Value

A 'strollur' object

Examples


miseq <- miseq_sop_example()

Get the names of various data in a strollur object

Description

Get the names of names sequences, bins, samples, treatments, and reports data in a strollur object

Usage

names(
  data,
  type = "sequence",
  bin_type = "otu",
  samples = NULL,
  distinct = FALSE
)

Arguments

data

a strollur object

type

string containing the type of data you would like. Options include: "dataset", "sequence", "bin", "sample", "treatment", "report". Default = "sequence".

bin_type

string containing the bin type you would like the names for. Default = "otu".

samples

vector of strings. samples is only used when 'type' = "sequence" or 'type' = "bin" . samples should contain the names of the samples you want names for. Default = NULL.

distinct

Boolean. distinct is used when 'type' = "sequence" or 'type' = "bin" and the samples parameter is used. The distinct parameter allows you to get the names that present given set of samples. When distinct is TRUE, the names function will return the names that ONLY contain data from the given samples. When distinct is FALSE the data returned contains data from a given samples, but may ALSO contain data from other samples. Default = FALSE.

Value

vector of strings, containing the names requested

Examples


miseq <- miseq_sop_example()

# To get the name of the dataset
names(data = miseq, type = "dataset")

# To get the names of the sequences
names(data = miseq, type = "sequence")

# To get the names of the sequences present sample 'F3D0'
names(data = miseq, type = "sequence", samples = c("F3D0"))

#' # To get the names of the sequences unique to sample 'F3D0'
names(data = miseq, type = "sequence", samples = c("F3D0"), distinct = TRUE)

# To get the names of the samples
names(data = miseq, type = "sample")

# To get the names of the treatments
names(data = miseq, type = "treatment")

# To get the names of the bins
names(data = miseq, type = "bin")

# To get the names of the bins that are unique to 'F3D0'
names(data = miseq, type = "bin", samples = c("F3D0"), distinct = TRUE)

# To get the names of the bins that include sequences from 'F3D0'
names(data = miseq, type = "bin", samples = c("F3D0"), distinct = FALSE)

# To get the names of the reports
names(data = miseq, type = "report")

new_dataset

Description

Create a new strollur object

Usage

new_dataset(dataset_name = "")

Arguments

dataset_name

string, a string containing the dataset name. Default = ""

Value

a strollur object

Examples


data <- new_dataset()

# to create a new dataset named "soil", run the following:

data <- new_dataset(dataset_name = "soil")

new_reference

Description

Create a resource reference for your strollur object to aid in reproducibility.

Usage

new_reference(
  name,
  vendor = "",
  version = "",
  usage = "",
  note = "",
  documentation_url = "",
  method_url = "",
  parameter = "",
  citation = ""
)

Arguments

name

a string containing the name of the resource used. For example: 'silva.bacteria.fasta' or 'R package phylotypr'.

vendor

a string containing name of entity that created original resource. example: "Silva" or "Schloss Lab - University of Michigan"

version

a string containing the version of the reference resource. For example: '1.38.1' or '0.1.1'. Default = "".

usage

a string containing the usage of the resource reference in your analysis. For example: 'alignment of sequences' or 'classification of sequences'. Default = "".

note

a string containing additional notes about the resource reference in your analysis. For example: 'alignment reference trimmed to V4 region' or 'classification of sequences using Bayesian method'. Default = "".

documentation_url

a string containing a web address where the reference may be downloaded or documentation may be found. Default = "".

method_url

a string containing any publications describing the methods used by the resource reference. For example: 'doi:10.1128/mra.01144-24'. Default = "".

parameter

a string containing the any specific parameters used by the resource. For example: 'kmer_size = 8, num_bootstraps = 100, min_confidence = 80' Default = "".

citation

a string containing the citation information for the resource reference. For example: "citation_key = "doi:10.1128/AEM.00062-07", author = "Qiong Wang and George M. Garrity and James M. Tiedje and James R. Cole", title = "Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy", journal = "Applied and Environmental Microbiology", volume = "73", number = "16", pages = "5261-5267", year = "2007", doi = "10.1128/AEM.00062-07"". Default = "".

Value

a list

Examples


silva_resource <- new_reference(
  vendor = "SILVA", name =
    "silva.bacteria.fasta", version = "1.38.1",
  usage = "alignment of sequences",
  note = "alignment reference trimmed to V4 region", documentation_url =
    "https://mothur.org/wiki/silva_reference_files/", method_url =
    "https://mothur.org/blog/2024/SILVA-v138_2-reference-files/"
)

phylotypr_resource <- new_reference(
  vendor = "Schloss Lab - University of
Michigan", name = "R phylotypr package", version = "0.1.1", usage =
    "classification of sequences",
  note = "classification using Bayesian method",
  parameter = "kmer_size = 8, num_bootstraps = 100, min_confidence = 80",
  documentation_url = "https://mothur.org/phylotypr/", method_url =
    "doi:10.1128/mra.01144-24",
  citation = "@article{doi:10.1128/AEM.00062-07,
author = {Qiong Wang and George M. Garrity and James M. Tiedje and James R.
Cole}, title = {Naïve Bayesian Classifier for Rapid Assignment of rRNA
Sequences into the New Bacterial Taxonomy}, journal = {Applied and
Environmental Microbiology}, volume = {73}, number = {16}, pages =
{5261-5267}, year = {2007}, doi = {10.1128/AEM.00062-07}, URL =
{https://journals.asm.org/doi/abs/10.1128/aem.00062-07}, eprint =
{https://journals.asm.org/doi/pdf/10.1128/aem.00062-07}}"
)

Create a strollur object from dada2 outputs

Description

This function reads a dada2 sequence table and creates a 'strollur' object. The dada2 sequence table is a 2D matrix containing the abundance counts by sample for each ASV. The sample names are stored as row names and the sequence nucleotide strings are stored as column names.

To generate the dada2 sequence table from your own files you can follow this dada2 tutorial.

Usage

read_dada2(sequence_table, dataset_name = "")

Arguments

sequence_table

A dada2 sequence table

dataset_name

A string containing a name for your dataset.

Value

A 'strollur' object

References

Callahan,B.J., McMurdie,P.J., Rosen,M.J., Han,A.W., Johnson,A.J.A. and Holmes,S.P. (2016), DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods 13:581-583. <doi:10.1038/nmeth.3869>

Examples


seqtab <- readRDS(strollur_example("dada2.rds"))
dim(seqtab)

data <- read_dada2(sequence_table = seqtab, dataset_name = "dada2 example")

read_fasta

Description

Read a FASTA formatted sequence file

Usage

read_fasta(fasta)

Arguments

fasta

FASTA file name (required)

Value

A data.frame containing the FASTA sequence data

Examples


fasta_data <- read_fasta(strollur_example("final.fasta.gz"))

# fasta_data is a data.frame.
# To access the names of the sequences in the file, run the following:

fasta_data$sequence_name

# To access the sequences in the file, run the following:

fasta_data$sequence

Create a strollur object from mothur outputs

Description

The read_mothur function reads various file types created by mothur, and creates a 'strollur' object.

To generate the various input files you can follow Pat's Miseq example analysis.

Usage

read_mothur(
  fasta = NULL,
  count = NULL,
  taxonomy = NULL,
  otu_list = NULL,
  asv_list = NULL,
  phylo_list = NULL,
  design = NULL,
  cons_taxonomy = NULL,
  otu_shared = NULL,
  asv_shared = NULL,
  phylo_shared = NULL,
  sample_tree = NULL,
  sequence_tree = NULL,
  dataset_name = ""
)

Arguments

fasta

filename, a FASTA formatted file containing sequence strings. fasta file

count

filename, a mothur count file

taxonomy

filename, a mothur taxonomy file, created by classify.seqs

otu_list

filename, a mothur list file containing otu bin assignments. The otu_list file is created by cluster, cluster.split, and cluster.fit

asv_list

filename, a mothur list file containing asv bin assignments. The asv_list file is created by cluster using the 'unique' method.

phylo_list

filename, a mothur list file containing phylotype bin assignments. The phylo_list file is created by phylotype.

design

filename, a mothur design file

cons_taxonomy

filename, a mothur consensus taxonomy file constaxonomy file. The cons_taxonomy file is created by classify.otu.

otu_shared

filename, a mothur shared file containing otu bin sample abundance assignments.

asv_shared

filename, a mothur shared file containing asv bin sample abundance assignments.

phylo_shared

filename, a mothur shared file containing phylotype bin sample abundance assignments.

sample_tree

filename, a tree that relates samples. The sample tree is created by tree.shared. We recommend running tree.shared with subsample = true, and using the 'ave.tre' output for best results.

sequence_tree

filename, a tree that relates sequences. The sequence tree is created by clearcut. We DO NOT recommend using sequence trees. With the ever growing size of modern datasets, sequence tree can be difficult / impossible to build without hitting a memory limitation.

dataset_name

A string containing a name for your dataset.

Value

A strollur object

Note

consensus taxonomy, The 'strollur' object will generate consensus taxonomies for you based on the sequence taxonomy assignment. You only need to provide the ".cons.taxonomy" file if you are not providing sequence taxonomy assignments.
shared / rabund file, The 'strollur' object will generate shared and rabund data for you based on the otu assignment in the list file and the count data. You only need to provide the ".shared" file if you are not providing the list and count files.

References

Schloss,P.D., Westcott,S.L., Ryabin,T., Hall,J.R., Hartmann,M., Hollister,E.B., Lesniewski,R.A., Oakley,B.B., Parks,D.H., Robinson,C.J., Sahl,J.W., Stres,B., Thallinger,G.G., Van Horn,D.J. and Weber,C.F. (2009), Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology 75:7537-7541. <doi:10.1128/AEM.01541-09>

Examples

# For dataset's including sequence data:

data <- read_mothur(
  fasta = strollur_example("final.fasta.gz"),
  count = strollur_example("final.count_table.gz"),
  taxonomy = strollur_example("final.taxonomy.gz"),
  design = strollur_example("mouse.time.design"),
  otu_list = strollur_example("final.opti_mcc.list.gz"),
  asv_list = strollur_example("final.asv.list.gz"),
  phylo_list = strollur_example("final.tx.list.gz"),
  sample_tree = strollur_example("final.opti_mcc.jclass.ave.tre"),
  dataset_name = "miseq_sop"
)

# For dataset's with only otu data:

data <- read_mothur(
  otu_shared = strollur_example("final.opti_mcc.shared"),
  cons_taxonomy = strollur_example(
    "final.cons.taxonomy"
  ),
  design = strollur_example("mouse.time.design"),
  sample_tree = strollur_example("final.opti_mcc.jclass.ave.tre"),
  dataset_name = "miseq_sop"
)

read_mothur_cons_taxonomy

Description

Read a mothur formatted cons_taxonomy file

Usage

read_mothur_cons_taxonomy(taxonomy)

Arguments

taxonomy

file name, a mothur consensus taxonomy file. The cons_taxonomy file is created by classify.otu.

Value

A data.frame containing the bin names, bin abundances and bin taxonomies.

Examples


# You can add the otu assignments and bin taxonomies to the your data set
# using the following:

# read mothur's consensus taxonomy file into a data.frame
otu_data <- read_mothur_cons_taxonomy(strollur_example(
  "final.cons.taxonomy"
))

data <- new_dataset()

# assign abundance only 'otu' bins
assign(data = data, table = otu_data, type = "bin", bin_type = "otu")

# assign consensus taxonomies to 'otu' bins
assign(
  data = data, table = otu_data,
  type = "bin_taxonomy", bin_type = "otu"
)

read_mothur_count

Description

Read a mothur formatted count file

Usage

read_mothur_count(filename)

Arguments

filename

count file name (required)

Value

data.frame

Examples


# mothur count file
# Representative_Sequence     total   sample2	sample3	sample4
# seq1	1150	250	400	500
# seq2	115	25	40	50
# seq3	50	25	25	0
# seq4	4	0	0	4

# returns
# sequence_name   sample abundance
# <char>  <char>     <int>
#  1:   seq1 sample2       250
#  2:   seq1 sample3       400
#  3:   seq1 sample4       500
#  4:   seq2 sample2        25
#  5:   seq2 sample3        40
#  6:   seq2 sample4        50
#  7:   seq3 sample2        25
#  8:   seq3 sample3        25
#  9:   seq4 sample4         4

# read a count file with samples
sample_table <- read_mothur_count(strollur_example("final.count_table.gz"))

# You can add your sequence abundance data to your `strollur` object as
# follows:

# create a new empty `strollur` object
data <- new_dataset()

# assign sequence abundances parsed by sample
assign(data, table = sample_table, type = "sequence_abundance")

# print summary of data
data

read_mothur_list

Description

Read a mothur formatted list file

Usage

read_mothur_list(list)

Arguments

list

file name. The list file can be created using several of mothur's commands. cluster, cluster.split, cluster.fit and phylotype.

Value

A data.frame containing the sequence otu assignments

Examples


# You can add your otu assignments to the your data set using the following:

# read mothur's list file into data.frame
otu_data <- read_mothur_list(strollur_example("final.opti_mcc.list.gz"))

# create a new empty `strollur` object
data <- new_dataset()

# assign sequences to 'otu' bins
assign(data = data, table = otu_data, type = "bin", bin_type = "otu")

read_mothur_rabund

Description

Read a mothur formatted rabund file

Usage

read_mothur_rabund(rabund)

Arguments

rabund

file name (required)

Value

A data.frame containing the sequence otu assignments

Examples


# You can add your otu assignments to the your data set using the following:

# read rabund file into data.frame
otu_data <- read_mothur_rabund(
  rabund =
    strollur_example("final.opti_mcc.rabund")
)

data <- new_dataset()

# assign abundance only 'otu' bins
assign(data = data, table = otu_data, type = "bin", bin_type = "otu")

read_mothur_shared

Description

Read a mothur formatted shared file

Usage

read_mothur_shared(shared)

Arguments

shared

file name (required)

Value

A data.frame containing the sequence otu assignments

Examples


# You can add your otu assignments to the your data set using the following:

# read mothur shared file into data.frame
otu_data <- read_mothur_shared(strollur_example("final.opti_mcc.shared"))

# create a new empty `strollur` object
data <- new_dataset()

# assign abundance only 'otu' bins parsed by sample
assign(data = data, table = otu_data, type = "bin", bin_type = "otu")

read_mothur_taxonomy

Description

Read a mothur formatted taxonomy file

Usage

read_mothur_taxonomy(taxonomy)

Arguments

taxonomy

file name. a mothur taxonomy file, created by classify.seqs

Value

A data.frame containing the sequences names and sequences taxonomies.

Examples


# You can add the sequences and their taxonomies to the your data set
# using the following:

# read mothur's taxonomy file into a data.frame
classification_data <- read_mothur_taxonomy(strollur_example(
  "final.taxonomy.gz"
))

# create a new empty `strollur` object
data <- new_dataset()

# assign sequence classifications
assign(data = data, table = classification_data, type = "sequence_taxonomy")

Create a strollur object from a phyloseq object

Description

The 'read_phyloseq()' function reads phyloseq objects created from the phyloseq package (https://www.bioconductor.org/packages/release/bioc/html/phyloseq.html) and converts it into a strollur object.

Usage

read_phyloseq(phyloseq_object, treatment_column_name = NULL, dataset_name = "")

Arguments

phyloseq_object

the phyloseq object that is returned when using any read function in the phyloseq package. It has to be of type "phyloseq"

treatment_column_name

the column name inside your phyloseq object within your sample data that is used to describe treatments. It must be a character. Defaults to NULL.

dataset_name

A string containing a name for your dataset.

Value

a strollur object.

References

McMurdie,P.J. and Holmes,S. (2013), phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLoS ONE 8:e61217. <doi:10.1371/journal.pone.0061217>

Examples

miseq <- miseq_sop_example()

if (requireNamespace("phyloseq", quietly = TRUE)) {
  phylo_obj <- write_phyloseq(miseq)
  miseq_re_read <- read_phyloseq(phylo_obj)
} else {
  message(paste(
    "To use this functionality you have to install the",
    "phyloseq package."
  ))
}

Create a strollur object from a qiime2 outputs

Description

The read_qiime2 function reads various types of .qza files created by qiime2, and creates a 'strollur' object.

Usage

read_qiime2(
  qza,
  metadata = NULL,
  dataset_name = "",
  dir_path = NULL,
  remove_unpacked_artifacts = TRUE
)

Arguments

qza

vector of filenames, .qza files containing your data from qiime2.

metadata

filename, a .tsv file containing metadata

dataset_name

A string containing a name for your dataset.

dir_path

a string containing the name of directory where the artifacts files should be unpacked. Default = current working directory.

remove_unpacked_artifacts

boolean, When TRUE, the unpacked artifacts and temporary directories will be removed. Default = TRUE.

Value

A 'strollur' object

References

Bolyen,E., Rideout,J.R., Dillon,M.R. et al. (2019), Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Bioinformatics 37:852-857. <doi:10.1038/s41587-019-0209-9>

Examples


# Using the example files from moving-pictures, we add FASTA data, assign
# taxonomy and abundance for features, and add a newick tree and
# metadata.

qza_files <- c(
  strollur_example("rep_seqs.qza"),
  strollur_example("table.qza"),
  strollur_example("taxonomy.qza"),
  strollur_example("rooted-tree.qza")
)

if (requireNamespace("h5lite", quietly = TRUE)) {
  data <- read_qiime2(
    qza = qza_files,
    metadata = strollur_example("sample_metadata.tsv"),
    dataset_name = "qiime2_moving_pictures"
  )
  data
} else {
  message(paste(
    "To use this functionality you have to install the",
    "h5lite package."
  ))
}

read_qiime2_feature_table

Description

Read a qiime2 qza containing bin data

Usage

read_qiime2_feature_table(
  qza,
  dir_path = NULL,
  remove_unpacked_artifacts = TRUE
)

Arguments

qza

file name, a qiime2 .qza file containing bin data.

dir_path

a string containing the name of directory where the artifacts files should be unpacked. Default = current working directory.

remove_unpacked_artifacts

boolean, When TRUE, the artifact's temporary directories will be removed after processing. Default = TRUE.

Value

A list containing artifact

Examples


if (requireNamespace("h5lite", quietly = TRUE)) {
  artifact <- read_qiime2_feature_table(strollur_example("table.qza"))

  # access the bin assignment table

  artifact$data

  # to create a `strollur` object with your data

  data <- new_dataset("my_data")

  assign(data = data, table = artifact$data, type = "bin")
  data
} else {
  message(paste(
    "To use this functionality you have to install the",
    "h5lite package."
  ))
}

read_qiime2_metadata

Description

Read a qiime2 .tsv table containing metadata.

Usage

read_qiime2_metadata(metadata)

Arguments

metadata

file name, a qiime2 .tsv file containing metadata about your analysis.

Value

A data.frame containing metadata

Examples


metadata <- read_qiime2_metadata(strollur_example(
  "sample_metadata.tsv"
))

read_qiime2_taxonomy

Description

Read a qiime2 qza containing taxonomy data

Usage

read_qiime2_taxonomy(qza, dir_path = NULL, remove_unpacked_artifacts = TRUE)

Arguments

qza

file name, a qiime2 .qza file containing taxonomy data.

dir_path

a string containing the name of directory where the artifacts files should be unpacked. Default = current working directory.

remove_unpacked_artifacts

boolean, When TRUE, the artifact's temporary directories will be removed after processing. Default = TRUE.

Value

A list containing artifact

Examples


artifact <- read_qiime2_taxonomy(strollur_example(
  "taxonomy.qza"
))

# access the taxonomy table

artifact$data

remove_file

Description

Remove file, if it exists

Usage

remove_file(filename)

Arguments

filename

String containing name of file to remove

Get a data.frame containing the given report in a strollur object

Description

Get a data.frame containing the report. Reports include FASTA format, sequences reports, sequence_bin_assignments, sequence_taxonomy, bin_taxonomy, bin_representatives, sample_assignments, metadata, references, sequence_scrap, and bin_scrap in a strollur object.

Usage

report(data, type = "sequence", bin_type = "otu")

Arguments

data

a strollur object

type

string containing the type of report you would like. Options include: "fasta", "sequence", "sequence_bin_assignment", "sequence_taxonomy", "bin_taxonomy", "bin_representative", "sample_assignment", "metadata", "resource_reference", "sequence_scrap", "bin_scrap". If you have added custom reports for alignment, contigs_assembly or chimeras, you can get those as well. Default = "sequence".

bin_type

string containing the bin type you would like a bin_taxonomy report for. Default = "otu".

Value

data.frame

Examples


miseq <- miseq_sop_example()

# To get the FASTA data

report(data = miseq, type = "fasta") |> head(n = 5)

# To get a report about the FASTA data

report(data = miseq, type = "sequence") |> head(n = 5)

# To get the sequence bin assignments

report(data = miseq, type = "sequence_bin_assignment", bin_type = "otu") |>
  head(n = 5)

# To get the sample treatment assignments

report(data = miseq, type = "sample_assignment")

# To get a report about sequence classifications

report(data = miseq, type = "sequence_taxonomy") |> head(n = 10)

# To get a report about bin classifications for 'otu' data

report(data = miseq, type = "bin_taxonomy", bin_type = "otu") |> head(n = 10)

# To get the 'otu' bin representative sequences

report(
  data = miseq, type = "bin_representative",
  bin_type = "otu"
) |> head(n = 5)

# To get a report about the sequences removed during your analysis:

report(data = miseq, type = "sequence_scrap")

# To get a report about the "otu" bins removed during your analysis:

report(data = miseq, type = "bin_scrap", bin_type = "otu")

# To get the metadata associated with your data:

metadata <- report(data = miseq, type = "metadata")

# To get the resource references associated with your data:

references <- report(data = miseq, type = "resource_reference")

# To get our custom report containing the contigs assembly data:

report(data = miseq, type = "contigs_report") |> head(n = 10)

save_dataset

Description

The save_dataset function will save the strollur object to file.

Usage

save_dataset(data, file)

Arguments

data

a strollur object

file

a string containing the file name.

Value

A file containing the 'strollur' object

Examples


data <- read_mothur(
  fasta = strollur_example("final.fasta.gz"),
  count = strollur_example("final.count_table.gz"),
  taxonomy = strollur_example("final.taxonomy.gz"),
  design = strollur_example("mouse.time.design"),
  otu_list = strollur_example("final.opti_mcc.list.gz"),
  dataset_name = "miseq_sop"
)

file_name <- file.path(tempdir(), "miseq_sop.rds")
save_dataset(data, file = file_name)

sort_dataframe

Description

Sort dataframe

Usage

sort_dataframe(data, order, named_col)

Arguments

data

the data.frame to be sorted

order

vector containing the order desired

named_col

name of column in data.frame to match order

# sort results alphabetically

miseq <- miseq_sop_example()

sequence_names <- names(miseq)

fasta <- report(miseq, type = fasta)

sorted_fasta <- sort_dataframe(fasta, order = sort(sequence_names), named_col = "sequence_names")

Value

sorted data.frame

The 'strollur' object stores the data associated with your amplicon sequence analysis.

Description

'strollur' is an R6 class that stores nucleotide sequences, abundance, sample and treatment assignments, taxonomic classifications, asv / otu clusters and various reports. It is designed to facilitate data analysis across multiple R packages.

Public fields

data: Rcpp::XPtr<Dataset> pointer to 'Dataset' c++ class. This allows package developers an easy access point to the underlying C++ code with additional functionality.
raw: Rcpp::RawVector containing the serialized data of the 'Dataset' c++ class. This allows the load and save functions to work with the class.
sequence_tree: a tree that relates sequences to each other
sample_tree: a tree that relates samples to each other

Methods

Public methods

strollur$new()
strollur$print()
strollur$abundance()
strollur$add()
strollur$add_sample_tree()
strollur$add_sequence_tree()
strollur$assign()
strollur$clear()
strollur$count()
strollur$get_bin_types()
strollur$get_sample_tree()
strollur$get_sequence_tree()
strollur$get_version()
strollur$is_equal()
strollur$names()
strollur$report()
strollur$summary()
strollur$clone()

`strollur$new()`

Create a new strollur dataset

Usage

strollur$new(name = "", dataset = NULL)

Arguments

name: String, name of dataset (optional)
dataset: a 'strollur' object.

Returns

A new 'strollur' object.

Examples

# to create an empty strollur object, run the following:

data <- new_dataset("soil")

`strollur$print()`

Print summary of 'strollur' object

Usage

strollur$print()

Returns

No return value, called for side effects.

Examples

miseq <- load_dataset(strollur_example("miseq_sop.rds"))
miseq

`strollur$abundance()`

Get the abundance data for sequences, bins, samples, and treatments.

Usage

strollur$abundance(type = "sequence", bin_type = "otu", by_sample = FALSE)

Arguments

type: string containing the type of data you want the number of. Options include: "sequence", "bin", "sample" and "treatment". Default = "sequence".
bin_type: string containing the bin type you would like the abundance data for. Default = "otu".
by_sample: Boolean. When by_sample is TRUE, the abundance data will be parsed by sample. Default = FALSE.

Returns

data.frame

Examples

miseq <- load_dataset(strollur_example("miseq_sop.rds"))

# To the total abundance for each sequence
miseq$abundance(type = "sequence") |> head(n = 5)

# To the total abundance for each sequence parsed by sample
miseq$abundance(type = "sequence", by_sample = TRUE) |> head(n = 5)

# To the total abundance for each "otu" bin
miseq$abundance(type = "bin", bin_type = "otu") |> head(n = 5)

# To the total abundance for each "otu" bin parsed by sample
miseq$abundance(type = "bin", bin_type = "otu", by_sample = TRUE) |>
head(n = 5)

# To the total abundance for each "asv" bin
miseq$abundance(type = "bin", bin_type = "asv") |> head(n = 5)

# To the total abundance for each "asv" bin parsed by sample
miseq$abundance(type = "bin", bin_type = "asv", by_sample = TRUE) |>
head(n = 5)

# To the total abundance for each sample
miseq$abundance(type = "sample") |> head(n = 5)

# To the total abundance for each treatment
miseq$abundance(type = "treatment")

`strollur$add()`

Add sequences, reports, metadata or resource references

Usage

strollur$add(
  table,
  type = "sequence",
  report_type = NULL,
  table_names = list(sequence_name = "sequence_name", sequence = "sequence", comment =
    "comment", reference_vendor = "vendor", reference_name = "name", reference_version =
    "version", reference_usage = "usage", reference_note = "note", reference_method_url =
    "method_url", reference_documentation_url = "documentation_url", reference_parameter
    = "parameter", reference_citation = "citation"),
  reference = NULL,
  verbose = TRUE
)

Arguments

table

a data.frame containing the data you wish to add.

type

a string containing the type of data. Options include: 'sequence', 'resource_reference' 'metadata' and 'report'.

report_type

a string containing the type of report you are adding. Options include: 'metadata' and custom reports.

table_names

named list used to indicate the names of the columns in the table. By default:

In table_names, 'comment' is a string containing the name of the column in 'table' that contains the sequence comments. It is used when you are adding FASTA data. Default column name is 'comment'.

In table_names, 'reference_vendor' is a string containing the name of the column in 'table' that contains the reference vendor names. It is used when ' you are adding reference data. Default column name is 'vendor'.

In table_names, 'reference_name' is a string containing the name of the ' column in 'table' that contains the reference names. It is used when you are ' adding reference data. Default column name is 'name'.

In table_names, 'reference_version' is a string containing the name of the ' column in 'table' that contains the reference versions. Default column name is 'version'.

In table_names, 'reference_usage' is a string containing the name of the column in 'table' that contains the reference usages. Default column name is 'usage'.

In table_names, 'reference_note' is a string containing the name of the column in 'table' that contains the reference notes. Default column name is 'note'.

In table_names, 'reference_method_url' is a string containing the name of the column in 'table' that contains the reference method urls. Default column name is 'method_url'.

In table_names, 'reference_documentation_url' is a string containing the name of the column in 'table' that contains the reference urls. Default column name is 'documentation_url'.

In table_names, 'reference_parameter' is a string containing the name of the column in 'table' that contains the reference parameters. Default column name is 'parameter'.

In table_names, 'reference_citation' is a string containing the name of the column in 'table' that contains the reference citations. Default column name is 'citation'.

reference

a list created by the function [new_reference]. Optional.

verbose

boolean indicating whether or not you want progress messages. Default = TRUE.

Returns

Updated 'strollur' object - invisible(self)

Examples

fasta_data <- read_fasta(fasta = strollur_example("final.fasta.gz"))
contigs_report <- readRDS(strollur_example("miseq_contigs_report.rds"))

# Create a new empty `strollur` object named 'example_dataset'
data <- new_dataset(dataset_name = "example_dataset")

data$add(table = fasta_data, type = "sequence")
data$add(
  table = contigs_report, type = "report",
  report_type = "contigs_report", list(sequence_name = "Name")
)

# To add metadata related to your study

metadata <- readRDS(strollur_example("miseq_metadata.rds"))

data$add(table = metadata, type = "metadata")

`strollur$add_sample_tree()`

Add phylo tree relating the samples in your dataset

Usage

strollur$add_sample_tree(tree)

Arguments

tree: a phylo tree object created by ape::read.tree.

Returns

Updated 'strollur' object

Examples

 data <- new_dataset("my_dataset")

 df <- read_mothur_shared(strollur_example("final.opti_mcc.shared"))
 assign(data = data, table = df, type = "bin", bin_type = "otu")

 tree <- ape::read.tree(strollur_example(
 "final.opti_mcc.jclass.ave.tre"))

 data$add_sample_tree(tree)

`strollur$add_sequence_tree()`

Add phylo tree relating the sequences in your dataset

Usage

strollur$add_sequence_tree(tree)

Arguments

tree: a phylo tree object created by ape::read.tree.

Returns

Updated 'strollur' object

Examples

 data <- new_dataset("my_dataset")
 tree <- ape::read.tree(strollur_example("final.phylip.tre.gz"))
 data$add_sequence_tree(tree)

`strollur$assign()`

Assign sequence abundances, sequence classifications, bins, bin representative sequences, bin classifications or treatments.

Usage

strollur$assign(
  table,
  type = "bin",
  bin_type = "otu",
  table_names = list(sequence_name = "sequence_name", abundance = "abundance", sample =
    "sample", treatment = "treatment", taxonomy = "taxonomy", bin_name = "bin_name"),
  reference = NULL,
  verbose = TRUE
)

Arguments

table

a data.frame containing the data you wish to assign

type

a string containing the type of data. Options include: 'sequence_abundance', 'sequence_taxonomy', 'bin', 'bin_representative', 'bin_taxonomy' and 'treatment'. Default = "bin".

bin_type