eXtreme Gradient Boosting(XGBoost)

基础

决策树

Boosting过程是构建决策树的过程,决策树可以根据某个输入指标对群体进行划分,依靠某个指标划分群体则构成一个简单的决策树。评判决策树划分效果一般使用信息增益(ID3)、信息增益率(C4.5)、基尼系数(CART),其核心思想是分类内部的纯度越高越好。

回归树

分类树的样本输出是类的形式,回归树输出的是数值,可使用均方误差和对数误差评估。

数据挖掘或机器学习中使用的决策树有两种主要类型:

  1. 分类树分析是指预测结果是数据所属的类(比如某个电影去看还是不看)

  2. 回归树分析是指预测结果可以被认为是实数(例如房屋的价格,或患者在医院中的逗留时间)

而术语分类回归树(CART,Classification And Regression Tree)分析是用于指代上述两种树的总称,由Breiman等人首先提出。

Gradient Boosting Decision Tree(GBDT决策树)

GBDT的原理很简单,就是所有弱分类器的结果相加等于预测值,然后下一个弱分类器去拟合误差函数对预测值的梯度/残差(这个梯度/残差就是预测值与真实值之间的误差)。

eXtreme Gradient Boosting(XGBoost)

XGBoost其本质也是GBDT的一种,但XGBoost与GBDT比较大的不同就是目标函数的定义。

Model method Value Type Libraries Tuning Parameters
eXtreme Gradient Boosting xgbDART Classification, Regression xgboost, plyr nrounds, max_depth, eta, gamma, subsample, colsample_bytree, rate_drop, skip_drop, min_child_weight
eXtreme Gradient Boosting xgbLinear Classification, Regression xgboost nrounds, lambda, alpha, eta
eXtreme Gradient Boosting xgbTree Classification, Regression xgboost, plyr nrounds, max_depth, eta, gamma, colsample_bytree, min_child_weight, subsample
Gradient Boosting Machines gbm_h2o Classification, Regression h2o ntrees, max_depth, min_rows, learn_rate, col_sample_rate
Stochastic Gradient Boosting gbm Classification, Regression gbm, plyr n.trees, interaction.depth, shrinkage, n.minobsinnode

加载数据

1
2
3
4
5
6
7
8
9
10
11
12
13
library(tidyverse)
library(ISLR)
library(caret)
library(pROC)

ml_data <- College
ml_data %>%
glimpse()

set.seed(123)
index <- createDataPartition(ml_data$Private, p = 0.7, list = FALSE)
train_data <- ml_data[index, ]
test_data <- ml_data[-index, ]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Rows: 777
Columns: 18
$ Private <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, …
$ Apps <dbl> 1660, 2186, 1428, 417, 193, 587, 353, 1899, 1038, 582, 1732, 2652, 1179, 1267, 494, 1420, 4302, 1216, 11…
$ Accept <dbl> 1232, 1924, 1097, 349, 146, 479, 340, 1720, 839, 498, 1425, 1900, 780, 1080, 313, 1093, 992, 908, 704, 2…
$ Enroll <dbl> 721, 512, 336, 137, 55, 158, 103, 489, 227, 172, 472, 484, 290, 385, 157, 220, 418, 423, 322, 1016, 252,…
$ Top10perc <dbl> 23, 16, 22, 60, 16, 38, 17, 37, 30, 21, 37, 44, 38, 44, 23, 9, 83, 19, 14, 24, 25, 20, 20, 24, 46, 12, 2…
$ Top25perc <dbl> 52, 29, 50, 89, 44, 62, 45, 68, 63, 44, 75, 77, 64, 73, 46, 22, 96, 40, 23, 54, 44, 63, 51, 49, 74, 52, …
$ F.Undergrad <dbl> 2885, 2683, 1036, 510, 249, 678, 416, 1594, 973, 799, 1830, 1707, 1130, 1306, 1317, 1018, 1593, 1819, 15…
$ P.Undergrad <dbl> 537, 1227, 99, 63, 869, 41, 230, 32, 306, 78, 110, 44, 638, 28, 1235, 287, 5, 281, 326, 1512, 23, 1035, …
$ Outstate <dbl> 7440, 12280, 11250, 12960, 7560, 13500, 13290, 13868, 15595, 10468, 16548, 17080, 9690, 12572, 8352, 870…
$ Room.Board <dbl> 3300, 6450, 3750, 5450, 4120, 3335, 5720, 4826, 4400, 3380, 5406, 4440, 4785, 4552, 3640, 4780, 5300, 35…
$ Books <dbl> 450, 750, 400, 450, 800, 500, 500, 450, 300, 660, 500, 400, 600, 400, 650, 450, 660, 550, 900, 500, 400,…
$ Personal <dbl> 2200, 1500, 1165, 875, 1500, 675, 1500, 850, 500, 1800, 600, 600, 1000, 400, 2449, 1400, 1598, 1100, 132…
$ PhD <dbl> 70, 29, 53, 92, 76, 67, 90, 89, 79, 40, 82, 73, 60, 79, 36, 78, 93, 48, 62, 60, 69, 83, 55, 88, 79, 57, …
$ Terminal <dbl> 78, 30, 66, 97, 72, 73, 93, 100, 84, 41, 88, 91, 84, 87, 69, 84, 98, 61, 66, 62, 82, 96, 65, 93, 88, 60,…
$ S.F.Ratio <dbl> 18.1, 12.2, 12.9, 7.7, 11.9, 9.4, 11.5, 13.7, 11.3, 11.5, 11.3, 9.9, 13.3, 15.3, 11.1, 14.7, 8.4, 12.1, …
$ perc.alumni <dbl> 12, 16, 30, 37, 2, 11, 26, 37, 23, 15, 31, 41, 21, 32, 26, 19, 63, 14, 18, 5, 35, 14, 25, 5, 24, 5, 30, …
$ Expend <dbl> 7041, 10527, 8735, 19016, 10922, 9727, 8861, 11487, 11644, 8991, 10932, 11711, 7940, 9305, 8127, 7355, 2…
$ Grad.Rate <dbl> 60, 56, 54, 59, 15, 55, 63, 73, 80, 52, 73, 76, 74, 68, 55, 69, 100, 59, 46, 34, 48, 70, 65, 48, 54, 48,…

基于xgboost包的普通构建模型

predictors的label是数值型

1
2
3
4
5
6
7
8
9
10
library(xgboost)
xgboost_model <- xgboost(data = as.matrix(train_data[, -1]),
label = as.numeric(train_data$Private)-1,
max_depth = 3,
objective = "binary:logistic",
nrounds = 10,
verbose = FALSE,
prediction = TRUE,
eval_metric = "logloss")
xgboost_model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
##### xgb.Booster
raw: 14.3 Kb
call:
xgb.train(params = params, data = dtrain, nrounds = nrounds,
watchlist = watchlist, verbose = verbose, print_every_n = print_every_n,
early_stopping_rounds = early_stopping_rounds, maximize = maximize,
save_period = save_period, save_name = save_name, xgb_model = xgb_model,
callbacks = callbacks, max_depth = 3, objective = "binary:logistic",
prediction = TRUE, eval_metric = "logloss")
params (as set within xgb.train):
max_depth = "3", objective = "binary:logistic", prediction = "TRUE", eval_metric = "logloss", validate_parameters = "TRUE"
xgb.attributes:
niter
callbacks:
cb.evaluation.log()
# of features: 17
niter: 10
nfeatures : 17
evaluation_log:
  • 预测结果汇总
1
2
3
4
5
6
predict(xgboost_model, 
as.matrix(test_data[, -1])) %>%
as_tibble() %>%
mutate(prediction = round(value),
label = as.numeric(test_data$Private)-1) %>%
count(prediction, label)
1
2
3
4
5
6
7
# A tibble: 4 x 3
prediction label n
<dbl> <dbl> <int>
1 0 0 52
2 0 1 6
3 1 0 11
4 1 1 163
  • ROC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
rocobj <- roc((as.numeric(test_data$Private) - 1), predict(xgboost_model, as.matrix(test_data[, -1]), type="prob"))
auc <- round(auc((as.numeric(test_data$Private) - 1), predict(xgboost_model, as.matrix(test_data[, -1]), type="prob")), 4)

ggroc(rocobj, color = "red", linetype = 1, size = 1, alpha = 1, legacy.axes = T)+
geom_abline(intercept = 0, slope = 1, color="grey", size = 1, linetype=1)+
labs(x = "False Positive Rate (1 - Specificity)",
y = "True Positive Rate (Sensivity or Recall)")+
annotate("text",x = .75, y = .25,label=paste("AUC =", auc),
size = 5, family="serif")+
coord_cartesian(xlim = c(0, 1), ylim = c(0, 1))+
theme_bw()+
theme(panel.background = element_rect(fill = 'transparent'),
axis.ticks.length = unit(0.4, "lines"),
axis.ticks = element_line(color='black'),
axis.line = element_line(size=.5, colour = "black"),
axis.title = element_text(colour='black', size=12,face = "bold"),
axis.text = element_text(colour='black',size=10,face = "bold"),
text = element_text(size=8, color="black", family="serif"))

基于xgboost包的调参建模 (一)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 构建Matrix
dtrain <- xgb.DMatrix(as.matrix(train_data[, -1]),
label = as.numeric(train_data$Private)-1)
dtest <- xgb.DMatrix(as.matrix(test_data[, -1]),
label = as.numeric(test_data$Private)-1)
# 定义参数
params <- list(max_depth = 3,
objective = "binary:logistic",
silent = 0)
# 定义训练集
watchlist <- list(train = dtrain, eval = dtest)

# 训练模型
bst_model <- xgb.train(params = params,
data = dtrain,
nrounds = 10,
watchlist = watchlist,
verbose = FALSE,
prediction = TRUE,
eval_metric = "logloss")
bst_model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
##### xgb.Booster
raw: 14.3 Kb
call:
xgb.train(params = params, data = dtrain, nrounds = 10, watchlist = watchlist,
verbose = FALSE, prediction = TRUE, eval_metric = "logloss")
params (as set within xgb.train):
max_depth = "3", objective = "binary:logistic", silent = "0", prediction = "TRUE", eval_metric = "logloss", validate_parameters = "TRUE"
xgb.attributes:
niter
callbacks:
cb.evaluation.log()
# of features: 17
niter: 10
nfeatures : 17
evaluation_log:
  • 预测结果汇总
1
2
3
4
5
6
predict(bst_model, 
as.matrix(test_data[, -1])) %>%
as_tibble() %>%
mutate(prediction = round(value),
label = as.numeric(test_data$Private)-1) %>%
count(prediction, label)
1
2
3
4
5
6
7
# A tibble: 4 x 3
prediction label n
<dbl> <dbl> <int>
1 0 0 52
2 0 1 6
3 1 0 11
4 1 1 163
  • ROC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
rocobj <- roc((as.numeric(test_data$Private) - 1), predict(bst_model, as.matrix(test_data[, -1]), type="prob"))
auc <- round(auc((as.numeric(test_data$Private) - 1), predict(bst_model, as.matrix(test_data[, -1]), type="prob")), 4)

ggroc(rocobj, color = "red", linetype = 1, size = 1, alpha = 1, legacy.axes = T)+
geom_abline(intercept = 0, slope = 1, color="grey", size = 1, linetype=1)+
labs(x = "False Positive Rate (1 - Specificity)",
y = "True Positive Rate (Sensivity or Recall)")+
annotate("text",x = .75, y = .25,label=paste("AUC =", auc),
size = 5, family="serif")+
coord_cartesian(xlim = c(0, 1), ylim = c(0, 1))+
theme_bw()+
theme(panel.background = element_rect(fill = 'transparent'),
axis.ticks.length = unit(0.4, "lines"),
axis.ticks = element_line(color='black'),
axis.line = element_line(size=.5, colour = "black"),
axis.title = element_text(colour='black', size=12,face = "bold"),
axis.text = element_text(colour='black',size=10,face = "bold"),
text = element_text(size=8, color="black", family="serif"))

基于xgboost包的调参建模 (二)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 调参
cv_model <- xgb.cv(params = params,
data = dtrain,
nrounds = 100,
watchlist = watchlist,
nfold = 5,
verbose = FALSE,
prediction = TRUE,
eval_metric = "logloss") # prediction of cv folds
# 获取最佳nrounds
cv_model$evaluation_log %>%
filter(test_logloss_mean == min(test_logloss_mean))
min_logloss <- min(cv_model$evaluation_log[, test_logloss_mean])
min_logloss_index <- which.min(cv_model$evaluation_log[, test_logloss_mean])
# 训练模型
nround <- min_logloss_index
best_param <- params
bst_model_cv <- xgb.train(data=dtrain, params=best_param, nrounds=nround, nthread=6)
1
2
   iter train_logloss_mean train_logloss_std test_logloss_mean test_logloss_std
1: 22 0.042874 0.006273908 0.1763722 0.04919096
  • ROC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
rocobj <- roc((as.numeric(test_data$Private) - 1), predict(bst_model_cv, as.matrix(test_data[, -1]), type="prob"))
auc <- round(auc((as.numeric(test_data$Private) - 1), predict(bst_model_cv, as.matrix(test_data[, -1]), type="prob")), 4)

ggroc(rocobj, color = "red", linetype = 1, size = 1, alpha = 1, legacy.axes = T)+
geom_abline(intercept = 0, slope = 1, color="grey", size = 1, linetype=1)+
labs(x = "False Positive Rate (1 - Specificity)",
y = "True Positive Rate (Sensivity or Recall)")+
annotate("text",x = .75, y = .25,label=paste("AUC =", auc),
size = 5, family="serif")+
coord_cartesian(xlim = c(0, 1), ylim = c(0, 1))+
theme_bw()+
theme(panel.background = element_rect(fill = 'transparent'),
axis.ticks.length = unit(0.4, "lines"),
axis.ticks = element_line(color='black'),
axis.line = element_line(size=.5, colour = "black"),
axis.title = element_text(colour='black', size=12,face = "bold"),
axis.text = element_text(colour='black',size=10,face = "bold"),
text = element_text(size=8, color="black", family="serif"))

基于caret包的建模

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# 训练集和测试集
X_train <- xgb.DMatrix(as.matrix(train_data %>% select(-Private)))
y_train <- train_data$Private
X_test <- xgb.DMatrix(as.matrix(test_data %>% select(-Private)))
y_test <- test_data$Private

# 训练模型参数
xgb_trcontrol <- trainControl(method = "cv",
number = 5,
allowParallel = TRUE,
verboseIter = FALSE,
returnData = FALSE)
# 训练算法参数:调参
xgbGrid <- expand.grid(nrounds = c(100,200),
max_depth = c(10, 15, 20, 25),
colsample_bytree = seq(0.5, 0.9, length.out = 5),
eta = 0.1,
gamma=0,
min_child_weight = 1,
subsample = 1)
# 训练
set.seed(123)
xgb_model_caret <- train(X_train, y_train,
trControl = xgb_trcontrol,
tuneGrid = xgbGrid,
method = "xgbTree")
xgb_model_caret$bestTune
1
2
   nrounds max_depth eta gamma colsample_bytree min_child_weight subsample
39 100 25 0.1 0 0.9 1 1
  • 预测结果
1
2
pred <- predict(xgb_model_caret, newdata=X_test)
print(confusionMatrix(pred, y_test))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Confusion Matrix and Statistics

Reference
Prediction No Yes
No 51 7
Yes 12 162

Accuracy : 0.9181
95% CI : (0.8751, 0.95)
No Information Rate : 0.7284
P-Value [Acc > NIR] : 3.803e-13

Kappa : 0.7877

Mcnemar's Test P-Value : 0.3588

Sensitivity : 0.8095
Specificity : 0.9586
Pos Pred Value : 0.8793
Neg Pred Value : 0.9310
Prevalence : 0.2716
Detection Rate : 0.2198
Detection Prevalence : 0.2500
Balanced Accuracy : 0.8841

'Positive' Class : No
  • ROC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
rocobj <- roc(y_test, predict(xgb_model_caret, newdata = X_test, type = "prob")[, "Yes"])
auc <- round(auc(y_test, predict(xgb_model_caret, newdata = X_test, type = "prob")[, "Yes"]), 4)
ggroc(rocobj, color = "red", linetype = 1, size = 1, alpha = 1, legacy.axes = T)+
geom_abline(intercept = 0, slope = 1, color="grey", size = 1, linetype=1)+
labs(x = "False Positive Rate (1 - Specificity)",
y = "True Positive Rate (Sensivity or Recall)")+
annotate("text",x = .75, y = .25,label=paste("AUC =", auc),
size = 5, family="serif")+
coord_cartesian(xlim = c(0, 1), ylim = c(0, 1))+
theme_bw()+
theme(panel.background = element_rect(fill = 'transparent'),
axis.ticks.length = unit(0.4, "lines"),
axis.ticks = element_line(color='black'),
axis.line = element_line(size=.5, colour = "black"),
axis.title = element_text(colour='black', size=12,face = "bold"),
axis.text = element_text(colour='black',size=10,face = "bold"),
text = element_text(size=8, color="black", family="serif"))

汇总ROC曲线

ggroc函数接受rocobject的list参数,可以根据此设定将各个模型的ROC汇总,方便直观比较不同方法下同一个算法参数不同其accuracy可能不同,也能说明调参在机器学习中的重要性。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

rocobj1 <- roc((as.numeric(test_data$Private) - 1), predict(xgboost_model, as.matrix(test_data[, -1]), type="prob"))
auc1 <- round(auc((as.numeric(test_data$Private) - 1), predict(xgboost_model, as.matrix(test_data[, -1]), type="prob")), 4)

rocobj2 <- roc((as.numeric(test_data$Private) - 1), predict(bst_model, as.matrix(test_data[, -1]), type="prob"))
auc2 <- round(auc((as.numeric(test_data$Private) - 1), predict(bst_model, as.matrix(test_data[, -1]), type="prob")), 4)

rocobj3 <- roc((as.numeric(test_data$Private) - 1), predict(bst_model_cv, as.matrix(test_data[, -1]), type="prob"))
auc3 <- round(auc((as.numeric(test_data$Private) - 1), predict(bst_model_cv, as.matrix(test_data[, -1]), type="prob")), 4)

rocobj4 <- roc(y_test, predict(xgb_model_caret, newdata = X_test, type = "prob")[, "Yes"])
auc4 <- round(auc(y_test, predict(xgb_model_caret, newdata = X_test, type = "prob")[, "Yes"]), 4)


rocboj_list <- list(xgboost=rocobj1,
best=rocobj2,
best_cv=rocobj3,
xgb_caret=rocobj4)

ggroc(rocboj_list, linetype = 1, size = 1, alpha = 1, legacy.axes = T)+
geom_abline(intercept = 0, slope = 1, color="grey", size = 1, linetype=1)+
labs(x = "False Positive Rate (1 - Specificity)",
y = "True Positive Rate (Sensivity or Recall)")+
annotate("text",x = .75, y = .25,label=paste("AUC =", auc1, "(xgboost_model)"),
size = 5, family="serif")+
annotate("text",x = .75, y = .20,label=paste("AUC =", auc2, "(bst_model)"),
size = 5, family="serif")+
annotate("text",x = .75, y = .15,label=paste("AUC =", auc3, "(bst_model_cv)"),
size = 5, family="serif")+
annotate("text",x = .75, y = .10,label=paste("AUC =", auc4, "(xgb_model_caret)"),
size = 5, family="serif")+
coord_cartesian(xlim = c(0, 1), ylim = c(0, 1))+
scale_colour_manual(values = c("red", "blue", "black", "green"))+
theme_bw()+
theme(panel.background = element_rect(fill = 'transparent'),
axis.ticks.length = unit(0.4, "lines"),
axis.ticks = element_line(color='black'),
axis.line = element_line(size=.5, colour = "black"),
axis.title = element_text(colour='black', size=12,face = "bold"),
axis.text = element_text(colour='black',size=10,face = "bold"),
text = element_text(size=8, color="black", family="serif"))

总结:通过tuning算法的参数,能够提升模型的accuracy,建议做机器学习的时候一定要选择调参模式

补充

除以上几种使用xgboost的方法外,还有基于h2oR包的方式,后续有时间再更新。

R Information

1
sessionInfo()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] ISLR_1.2 forcats_0.5.0 stringr_1.4.0 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2
[7] tidyverse_1.3.0 xgboost_1.3.1.1 mlbench_2.1-1 survminer_0.4.8 ggpubr_0.4.0 survcomp_1.40.0
[13] prodlim_2019.11.13 survival_3.2-7 caretEnsemble_2.0.1 pROC_1.16.2 caret_6.0-86 ggplot2_3.3.3
[19] lattice_0.20-41 data.table_1.13.6 tibble_3.0.4 dplyr_1.0.2

loaded via a namespace (and not attached):
[1] readxl_1.3.1 backports_1.2.0 plyr_1.8.6 splines_4.0.3 digest_0.6.27
[6] SuppDists_1.1-9.5 foreach_1.5.1 htmltools_0.5.0 fansi_0.4.1 magrittr_1.5
[11] openxlsx_4.2.3 recipes_0.1.15 modelr_0.1.8 gower_0.2.2 colorspace_2.0-0
[16] rvest_0.3.6 haven_2.3.1 xfun_0.19 crayon_1.3.4 jsonlite_1.7.1
[21] libcoin_1.0-7 zoo_1.8-8 iterators_1.0.13 glue_1.4.2 gtable_0.3.0
[26] ipred_0.9-9 questionr_0.7.3 car_3.0-10 kernlab_0.9-29 abind_1.4-5
[31] scales_1.1.1 mvtnorm_1.1-1 DBI_1.1.0 rstatix_0.6.0 miniUI_0.1.1.1
[36] Rcpp_1.0.5 xtable_1.8-4 Cubist_0.2.3 foreign_0.8-80 km.ci_0.5-2
[41] Formula_1.2-4 stats4_4.0.3 lava_1.6.8.1 httr_1.4.2 ellipsis_0.3.1
[46] pkgconfig_2.0.3 farver_2.0.3 nnet_7.3-14 dbplyr_2.0.0 utf8_1.1.4
[51] tidyselect_1.1.0 labeling_0.4.2 rlang_0.4.8 reshape2_1.4.4 later_1.1.0.1
[56] munsell_0.5.0 cellranger_1.1.0 tools_4.0.3 cli_2.1.0 generics_0.1.0
[61] broom_0.7.3 evaluate_0.14 fastmap_1.0.1 yaml_2.2.1 bootstrap_2019.6
[66] ModelMetrics_1.2.2.2 knitr_1.30 fs_1.5.0 zip_2.1.1 survMisc_0.5.5
[71] caTools_1.18.0 randomForest_4.6-14 pbapply_1.4-3 nlme_3.1-150 mime_0.9
[76] xml2_1.3.2 compiler_4.0.3 rstudioapi_0.12 curl_4.3 e1071_1.7-4
[81] ggsignif_0.6.0 reprex_0.3.0 klaR_0.6-15 stringi_1.5.3 highr_0.8
[86] Matrix_1.2-18 gbm_2.1.8 ggsci_2.9 survivalROC_1.0.3 KMsurv_0.1-5
[91] vctrs_0.3.4 pillar_1.4.6 lifecycle_0.2.0 combinat_0.0-8 cowplot_1.1.1
[96] bitops_1.0-6 httpuv_1.5.4 R6_2.5.0 promises_1.1.1 KernSmooth_2.23-18
[101] gridExtra_2.3 C50_0.1.3.1 rio_0.5.16 codetools_0.2-18 MASS_7.3-53
[106] assertthat_0.2.1 withr_2.3.0 parallel_4.0.3 hms_0.5.3 grid_4.0.3
[111] rpart_4.1-15 labelled_2.7.0 timeDate_3043.102 class_7.3-17 rmarkdown_2.5
[116] inum_1.0-1 carData_3.0-4 partykit_1.2-11 shiny_1.5.0 lubridate_1.7.9
[121] rmeta_3.0

参考

  1. XGBoost详解
  2. 什么是XGBoost
  3. 通俗理解kaggle比赛大杀器xgboost
  4. The optimal parameters for xgb_train via xgb_cv

参考文章如引起任何侵权问题,可以与我联系,谢谢。


------------- The End Thanks for reading --------