This function offers different methods to impute missing values in data.
impute_abundance(
object,
level = c(NULL, "Kingdom", "Phylum", "Class",
"Order", "Family", "Genus",
"Species", "Strain", "unique"),
group,
ZerosAsNA = FALSE,
RemoveNA = TRUE,
cutoff = 20,
method = c("none", "LOD", "half_min", "median",
"mean", "min", "knn", "rf",
"global_mean", "svd", "QRILC"),
LOD = NULL,
knum = 10)
(Required). a phyloseq::phyloseq
or
SummarizedExperiment::SummarizedExperiment
object.
(Optional). character. Summarization
level (from rank_names(pseq)
, default: NULL).
(Required). character. group for determining missing values.
(Optional). logical. zeros in the data are missing values (default: FALSE).
(Optional). logical. those features with more than selected cutoff missing values in each group have to be removed (default: TRUE).
(Optional). numeric. percentage of missing values allowed in each group. If one of the groups have less missing values than selected cutoff value, these feature will not be removed.
(Optional). character. Imputation method. Options are:
"none": all missing values will be replaced by zero.
"LOD": specific Limit Of Detection which provides by user.
"half_min": half minimal values across samples except zero.
"median": median values across samples except zero.
"mean": mean values across samples except zero.
"min": minimal values across samples except zero.
"knn": k-nearest neighbors samples.
"rf": nonparametric missing value imputation using Random Forest.
"global_mean": a normal distribution with a mean that is down-shifted from the sample mean and a standard deviation that is a fraction of the standard deviation of the sample distribution.
"svd": missing values imputation based Singular value decomposition.
"QRILC": missing values imputation based quantile regression. (default: "none").
(Optional). Numeric. limit of detection (default: NULL).
(Optional). Numeric. Number of neighbors to be used in the imputation (default=10).
A phyloseq::phyloseq
or
SummarizedExperiment::SummarizedExperiment
object with cleaned data.
Armitage, E. G., Godzien, J., Alonso‐Herranz, V., López‐Gonzálvez, Á., & Barbas, C. (2015). Missing value imputation strategies for metabolomics data. Electrophoresis, 36(24), 3050-3060.
if (FALSE) {
# phyloseq object
data("Zeybel_2022_gut")
impute_abundance(
Zeybel_2022_gut,
level = "Phylum",
group = "LiverFatClass",
ZerosAsNA = TRUE,
RemoveNA = TRUE,
cutoff = 20,
method = "knn")
# SummarizedExperiment object
data("Zeybel_2022_protein")
impute_abundance(
Zeybel_2022_protein,
group = "LiverFatClass",
ZerosAsNA = TRUE,
RemoveNA = TRUE,
cutoff = 20,
method = "knn")
}