ggstatsplot: 基于ggplot2语法具有统计信息的图形

介绍

  在典型的探索性数据分析工作流程中,数据可视化和统计建模是两个不同的阶段:可视化技术为建模提供直观图形界面信息,而根据建模的结果又可以使用不同的可视化方法。 ggstatsplot包的中心思想是:将这两个阶段以具有统计细节的图形形式组合成一个阶段,这使得数据探索更加简单快捷。

功能模块

ggstatsplot包含众多的函数模块,其中主要在小提琴图、柱状图、散点图以及百分比图等可视化上。

Function Plot Description
ggbetweenstats violin plots for comparisons between groups/conditions
ggwithinstats violin plots for comparisons within groups/conditions
gghistostats histograms for distribution about numeric variable
ggdotplotstats dot plots/charts for distribution about labeled numeric variable
ggpiestats pie charts for categorical data
ggbarstats bar charts for categorical data
ggscatterstats scatterplots for correlations between two variables
ggcorrmat correlation matrices for correlations between multiple variables
ggcoefstats dot-and-whisker plots for regression models and meta-analysis

统计方法

支持的统计方法有:参数、非参数、贝叶斯、t-test、方差分析、相关分析和列联表等分析(parametric, nonparametric, robust, and bayesian versions of t-test/anova, correlation analyses, contingency table
analysis, meta-analysis, and regression analyses)。

Functions Description Parametric Non-parametric Robust Bayes Factor
ggbetweenstats Between group/condition comparisons Yes Yes Yes Yes
ggwithinstats Within group/condition comparisons Yes Yes Yes Yes
gghistostats, ggdotplotstats Distribution of a numeric variable Yes Yes Yes Yes
ggcorrmat Correlation matrix Yes Yes Yes Yes
ggscatterstats Correlation between two variables Yes Yes Yes Yes
ggpiestats, ggbarstats Association between categorical variables Yes NA NA Yes
ggpiestats, ggbarstats Equal proportions for categorical variable levels Yes NA NA Yes
ggcoefstats Regression model coefficients Yes Yes Yes Yes
ggcoefstats Random-effects meta-analysis Yes No Yes Yes

图形展示统计信息如下

安装

  • 在window系统下,通过CRAN直接安装已完成版本;
1
2
3
4
5
6
7
8
install.packages("ggstatsplot")

# development version
remotes::install_github(
repo = "IndrajeetPatil/ggstatsplot", # package path on GitHub
dependencies = TRUE, # installs packages which ggstatsplot depends on
upgrade_dependencies = TRUE # updates any out of date dependencies
)
  • Linux下安装,需要安装较多的依赖包(centos 8);
1
2
3
4
5
# ubuntu : sudo apt-get
sudo yum install libmpfr-dev libmpfr-doc libmpfr4 libmpfr4-dbg

# biomanager installation
BiocManager::install("ggstatsplot")

实例

ggbetweenstats

该函数可创建小提琴图,箱形图或两者的混合,并在图形中包含组间或条件间的统计检验的结果。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
set.seed(123)
library(ggplot2)
library(ggstatsplot)

# plot
ggbetweenstats(
data = ToothGrowth,
x = supp,
y = len,
notch = TRUE, # show notched box plot
mean.ci = TRUE, # whether to display confidence interval for means
k = 3, # number of decimal places for statistical results
outlier.tagging = TRUE, # whether outliers need to be tagged
outlier.label = dose, # variable to be used for the outlier tag
xlab = "Supplement type", # label for the x-axis variable
ylab = "Tooth length", # label for the y-axis variable
title = "The Effect of Vitamin C on Tooth Growth", # title text for the plot
ggtheme = ggthemes::theme_fivethirtyeight(), # choosing a different theme
ggstatsplot.layer = FALSE, # turn off `ggstatsplot` theme layer
package = "wesanderson", # package from which color palette is to be taken
palette = "Darjeeling1" # choosing a different color palette
)

# add group facet
# for reproducibility
set.seed(123)
ggstatsplot::grouped_ggbetweenstats(
data = dplyr::filter(
.data = ggstatsplot::movies_long,
genre %in% c("Action", "Action Comedy", "Action Drama", "Comedy")
),
x = mpaa,
y = length,
grouping.var = genre, # grouping variable
ggsignif.args = list(textsize = 4, tip_length = 0.01),
p.adjust.method = "bonferroni", # method for adjusting p-values for multiple comparisons
# adding new components to `ggstatsplot` default
ggplot.component = list(ggplot2::scale_y_continuous(sec.axis = ggplot2::dup_axis())),
k = 3,
title.prefix = "Movie genre",
caption = substitute(paste(italic("Source"), ": IMDb (Internet Movie Database)")),
palette = "default_jama",
package = "ggsci",
plotgrid.args = list(nrow = 2),
title.text = "Differences in movie length by mpaa ratings for different genres"
)

ggscatterstats

该函数可创建散点图并包含坐标分布的统计信息的图形。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# for reproducibility
set.seed(123)

# plot
ggstatsplot::grouped_ggscatterstats(
data = dplyr::filter(
.data = ggstatsplot::movies_long,
genre %in% c("Action", "Action Comedy", "Action Drama", "Comedy")
),
x = rating,
y = length,
grouping.var = genre, # grouping variable
label.var = title,
label.expression = length > 200,
xfill = "#E69F00",
yfill = "#8b3058",
xlab = "IMDB rating",
title.prefix = "Movie genre",
ggtheme = ggplot2::theme_grey(),
ggplot.component = list(
ggplot2::scale_x_continuous(breaks = seq(2, 9, 1), limits = (c(2, 9)))
),
plotgrid.args = list(nrow = 2),
title.text = "Relationship between movie length by IMDB ratings for different genres"
)

ggpiestats

This function creates a pie chart for categorical or nominal variables with results from contingency table analysis (Pearson’s chi-squared test for between-subjects design and McNemar’s chi-squared test for within-subjects design) included in the subtitle of the plot

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# for reproducibility
set.seed(123)

# plot
ggstatsplot::ggpiestats(
data = data.frame(
"before" = c("Approve", "Approve", "Disapprove", "Disapprove"),
"after" = c("Approve", "Disapprove", "Approve", "Disapprove"),
counts = c(794, 150, 86, 570),
check.names = FALSE
),
x = before,
y = after,
counts = counts,
title = "Survey results before and after the intervention",
label = "both",
paired = TRUE, # within-subjects design
package = "wesanderson",
palette = "Royal1"
)

引用

  1. ggstatsplot tutorial

参考文章如引起任何侵权问题,可以与我联系,谢谢。


------------- The End Thanks for reading --------