R/da_simple_statistic.R
run_simple_stat.Rd
Perform simple statistical analysis of metagenomic profiles. This function
is a wrapper of run_test_two_groups
and run_test_multiple_groups
.
run_simple_stat(
ps,
group,
taxa_rank = "all",
transform = c("identity", "log10", "log10p", "SquareRoot", "CubicRoot", "logit"),
norm = "TSS",
norm_para = list(),
method = c("welch.test", "t.test", "white.test", "anova", "kruskal"),
p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"),
pvalue_cutoff = 0.05,
diff_mean_cutoff = NULL,
ratio_cutoff = NULL,
eta_squared_cutoff = NULL,
conf_level = 0.95,
nperm = 1000,
...
)
a phyloseq::phyloseq
object
character, the variable to set the group
character to specify taxonomic rank to perform
differential analysis on. Should be one of
phyloseq::rank_names(phyloseq)
, or "all" means to summarize the taxa by
the top taxa ranks (summarize_taxa(ps, level = rank_names(ps)[1])
), or
"none" means perform differential analysis on the original taxa
(taxa_names(phyloseq)
, e.g., OTU or ASV).
character, the methods used to transform the microbial
abundance. See transform_abundances()
for more details. The
options include:
"identity", return the original data without any transformation (default).
"log10", the transformation is log10(object)
, and if the data contains
zeros the transformation is log10(1 + object)
.
"log10p", the transformation is log10(1 + object)
.
"SquareRoot", the transformation is Square Root
.
"CubicRoot", the transformation is Cubic Root
.
"logit", the transformation is Zero-inflated Logit Transformation
(Does not work well for microbiome data).
the methods used to normalize the microbial abundance data. See
normalize()
for more details.
Options include:
"none": do not normalize.
"rarefy": random subsampling counts to the smallest library size in the data set.
"TSS": total sum scaling, also referred to as "relative abundance", the abundances were normalized by dividing the corresponding sample library size.
"TMM": trimmed mean of m-values. First, a sample is chosen as reference. The scaling factor is then derived using a weighted trimmed mean over the differences of the log-transformed gene-count fold-change between the sample and the reference.
"RLE", relative log expression, RLE uses a pseudo-reference calculated using the geometric mean of the gene-specific abundances over all samples. The scaling factors are then calculated as the median of the gene counts ratios between the samples and the reference.
"CSS": cumulative sum scaling, calculates scaling factors as the cumulative sum of gene abundances up to a data-derived threshold.
"CLR": centered log-ratio normalization.
"CPM": pre-sample normalization of the sum of the values to 1e+06.
arguments passed to specific normalization methods
test method, options include: "welch.test", "t.test" and "white.test" for two groups comparison, "anova"and "kruskal" for multiple groups comparison.
method for multiple test correction, default none
,
for more details see stats::p.adjust.
numeric, p value cutoff, default 0.05
only used for two groups comparison,
cutoff of different means and ratios, default NULL
which means no effect
size filter.
only used for multiple groups comparison, numeric,
cutoff of effect size (eta squared) default NULL
which means no effect
size filter.
only used for two groups comparison, numeric, confidence level of interval.
integer, only used for two groups comparison, number of permutations for white non parametric t test estimation
only used for two groups comparison, extra arguments passed to
t.test()
or fisher.test()
.
a microbiomeMarker
object.
data(enterotypes_arumugam)
ps <- phyloseq::subset_samples(
enterotypes_arumugam,
Enterotype %in% c("Enterotype 3", "Enterotype 2")
)
run_simple_stat(ps, group = "Enterotype")
#> microbiomeMarker-class inherited from phyloseq-class
#> normalization method: [ TSS ]
#> microbiome marker identity method: [ welch.test ]
#> marker_table() Marker Table: [ 16 microbiome markers with 5 variables ]
#> otu_table() OTU Table: [ 235 taxa and 24 samples ]
#> sample_data() Sample Data: [ 24 samples by 9 sample variables ]
#> tax_table() Taxonomy Table: [ 235 taxa by 1 taxonomic ranks ]