This function offers different methods to impute missing values in data.

impute_abundance(
   object,
   level = c(NULL, "Kingdom", "Phylum", "Class",
           "Order", "Family", "Genus",
           "Species", "Strain", "unique"),
   group,
   ZerosAsNA = FALSE,
   RemoveNA = TRUE,
   cutoff = 20,
   method = c("none", "LOD", "half_min", "median",
       "mean", "min", "knn", "rf",
       "global_mean", "svd", "QRILC"),
   LOD = NULL,
   knum = 10)

Arguments

object

(Required). a phyloseq::phyloseq or SummarizedExperiment::SummarizedExperiment object.

level

(Optional). character. Summarization level (from rank_names(pseq), default: NULL).

group

(Required). character. group for determining missing values.

ZerosAsNA

(Optional). logical. zeros in the data are missing values (default: FALSE).

RemoveNA

(Optional). logical. those features with more than selected cutoff missing values in each group have to be removed (default: TRUE).

cutoff

(Optional). numeric. percentage of missing values allowed in each group. If one of the groups have less missing values than selected cutoff value, these feature will not be removed.

method

(Optional). character. Imputation method. Options are:

  • "none": all missing values will be replaced by zero.

  • "LOD": specific Limit Of Detection which provides by user.

  • "half_min": half minimal values across samples except zero.

  • "median": median values across samples except zero.

  • "mean": mean values across samples except zero.

  • "min": minimal values across samples except zero.

  • "knn": k-nearest neighbors samples.

  • "rf": nonparametric missing value imputation using Random Forest.

  • "global_mean": a normal distribution with a mean that is down-shifted from the sample mean and a standard deviation that is a fraction of the standard deviation of the sample distribution.

  • "svd": missing values imputation based Singular value decomposition.

  • "QRILC": missing values imputation based quantile regression. (default: "none").

LOD

(Optional). Numeric. limit of detection (default: NULL).

knum

(Optional). Numeric. Number of neighbors to be used in the imputation (default=10).

Value

A phyloseq::phyloseq or SummarizedExperiment::SummarizedExperiment object with cleaned data.

References

Armitage, E. G., Godzien, J., Alonso‐Herranz, V., López‐Gonzálvez, Á., & Barbas, C. (2015). Missing value imputation strategies for metabolomics data. Electrophoresis, 36(24), 3050-3060.

Author

Created by Pol Castellano-Escuder; Modified by Hua Zou (12/02/2022 Shenzhen China)

Examples


if (FALSE) {
# phyloseq object
data("Zeybel_2022_gut")
impute_abundance(
  Zeybel_2022_gut,
  level = "Phylum",
  group = "LiverFatClass",
  ZerosAsNA = TRUE,
  RemoveNA = TRUE,
  cutoff = 20,
  method = "knn")

# SummarizedExperiment object
data("Zeybel_2022_protein")
impute_abundance(
  Zeybel_2022_protein,
  group = "LiverFatClass",
  ZerosAsNA = TRUE,
  RemoveNA = TRUE,
  cutoff = 20,
  method = "knn")
}