Differential expression analysis based on the Negative Binomial distribution using DESeq2.

run_deseq2(
  ps,
  group,
  confounders = character(0),
  contrast = NULL,
  taxa_rank = "all",
  norm = "RLE",
  norm_para = list(),
  transform = c("identity", "log10", "log10p", "SquareRoot", "CubicRoot", "logit"),
  fitType = c("parametric", "local", "mean", "glmGamPoi"),
  sfType = "poscounts",
  betaPrior = FALSE,
  modelMatrixType,
  useT = FALSE,
  minmu = ifelse(fitType == "glmGamPoi", 1e-06, 0.5),
  p_adjust = c("none", "fdr", "bonferroni", "holm", "hochberg", "hommel", "BH", "BY"),
  pvalue_cutoff = 0.05,
  ...
)

Arguments

ps

a phyloseq::phyloseq object.

group

character, the variable to set the group, must be one of the var of the sample metadata.

confounders

character vector, the confounding variables to be adjusted. default character(0), indicating no confounding variable.

contrast

this parameter only used for two groups comparison while there are multiple groups. For more please see the following details.

taxa_rank

character to specify taxonomic rank to perform differential analysis on. Should be one of phyloseq::rank_names(phyloseq), or "all" means to summarize the taxa by the top taxa ranks (summarize_taxa(ps, level = rank_names(ps)[1])), or "none" means perform differential analysis on the original taxa (taxa_names(phyloseq), e.g., OTU or ASV).

norm

the methods used to normalize the microbial abundance data. See normalize() for more details. Options include:

  • "none": do not normalize.

  • "rarefy": random subsampling counts to the smallest library size in the data set.

  • "TMM": trimmed mean of m-values. First, a sample is chosen as reference. The scaling factor is then derived using a weighted trimmed mean over the differences of the log-transformed gene-count fold-change between the sample and the reference.

  • "RLE", relative log expression, RLE uses a pseudo-reference calculated using the geometric mean of the gene-specific abundances over all samples. The scaling factors are then calculated as the median of the gene counts ratios between the samples and the reference.

  • "CSS": cumulative sum scaling, calculates scaling factors as the cumulative sum of gene abundances up to a data-derived threshold.

norm_para

arguments passed to specific normalization methods. Most users will not need to pass any additional arguments here.

transform

character, the methods used to transform the microbial abundance. See transform_abundances() for more details. The options include:

  • "identity", return the original data without any transformation (default).

  • "log10", the transformation is log10(object), and if the data contains zeros the transformation is log10(1 + object).

  • "log10p", the transformation is log10(1 + object).

  • "SquareRoot", the transformation is Square Root.

  • "CubicRoot", the transformation is Cubic Root.

  • "logit", the transformation is Zero-inflated Logit Transformation (Does not work well for microbiome data).

fitType, sfType, betaPrior, modelMatrixType, useT, minmu

these seven parameters are inherited form DESeq2::DESeq().

  • fitType, either "parametric", "local", "mean", or "glmGamPoi" for the type of fitting of dispersions to the mean intensity.

  • sfType, either "ratio", "poscounts", or "iterate" for the type of size factor estimation. We recommend to use "poscounts".

  • betaPrior, whether or not to put a zero-mean normal prior on the non-intercept coefficients.

  • modelMatrixType, either "standard" or "expanded", which describe how the model matrix,

  • useT, logical, where Wald statistics are assumed to follow a standard Normal.

  • minmu, lower bound on the estimated count for fitting gene-wise dispersion.

For more details, see DESeq2::DESeq(). Most users will not need to set this arguments (just use the defaults).

p_adjust

method for multiple test correction, default none, for more details see stats::p.adjust.

pvalue_cutoff

pvalue_cutoff numeric, p value cutoff, default 0.05.

...

extra parameters passed to DESeq2::DESeq().

Value

a microbiomeMarker object.

Details

Note: DESeq2 requires the input is raw counts (un-normalized counts), as only the counts values allow assessing the measurement precision correctly. For more details see the vignette of DESeq2 (vignette("DESeq2")).

Thus, this function only supports "none", "rarefy", "RLE", "CSS", and "TMM" normalization methods. We strongly recommend using the "RLE" method (default normalization method in the DESeq2 package). The other normalization methods are used for expert users and comparisons among different normalization methods.

For two groups comparison, this function utilizes the Wald test (defined by DESeq2::nbinomWaldTest()) for hypothesis testing. A Wald test statistic is computed along with a probability (p-value) that a test statistic at least as extreme as the observed value were selected at random. contrasts are used to specify which two groups to compare. The order of the names determines the direction of fold change that is reported.

Likelihood ratio test (LRT) is used to identify the genes that significantly changed across all the different levels for multiple groups comparisons. The LRT identified the significant features by comparing the full model to the reduced model. It is testing whether a feature removed in the reduced model explains a significant variation in the data.

contrast must be a two length character or NULL (default). It is only required to set manually for two groups comparison when there are multiple groups. The order determines the direction of comparison, the first element is used to specify the reference group (control). This means that, the first element is the denominator for the fold change, and the second element is used as baseline (numerator for fold change). Otherwise, users do required to concern this parameter (set as default NULL), and if there are two groups, the first level of groups will set as the reference group; if there are multiple groups, it will perform an ANOVA-like testing to find markers which difference in any of the groups.

References

Love, Michael I., Wolfgang Huber, and Simon Anders. "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome biology 15.12 (2014): 1-21.

Examples

data(enterotypes_arumugam)
ps <- phyloseq::subset_samples(
    enterotypes_arumugam,
    Enterotype %in% c("Enterotype 3", "Enterotype 2")) %>%
    phyloseq::subset_taxa(Phylum %in% c("Firmicutes"))
run_deseq2(ps, group = "Enterotype")
#> Warning: Some counts are non-integers, they are rounded to integers.
#> Raw count is recommended for reliable results for deseq2 method.
#> converting counts to integer mode
#> microbiomeMarker-class inherited from phyloseq-class
#> normalization method:              [ RLE ]
#> microbiome marker identity method: [ DESeq2: Wald ]
#> marker_table() Marker Table:       [ 6 microbiome markers with 5 variables ]
#> otu_table()    OTU Table:          [ 78 taxa and  24 samples ]
#> sample_data()  Sample Data:        [ 24 samples by  10 sample variables ]
#> tax_table()    Taxonomy Table:     [ 78 taxa by 1 taxonomic ranks ]