Home>

Is it the production environment for opportunity learning? I have a question about the parameters for actual operation.

How should we adopt the parameters in the actual operation of XGBoost?

XGB Classifier classifies 0 and 1 and uses optuna for adjustment.

Parameter reference

* Parameters adjusted by optuna
Parameters when A data is divided by 80:20 and 95: 5
80:20
{'n_estimators': 477,'max_depth': 8,'min_child_weight': 6,'learning_rate': 0.004,'scale_pos_weight': 1,'subsample': 0.8,'colsample_bytree': 0.8}
95: 5
{'n_estimators': 595,'max_depth': 5,'min_child_weight': 8,'learning_rate': 0.003,'scale_pos_weight': 1,'subsample': 0.9,'colsample_bytree': 0.6}
Parameters when B data is divided by 80:20 and 95: 5
80:20
{'n_estimators': 854,'max_depth': 13,'min_child_weight': 2,'learning_rate': 0.009000000000000001,'scale_pos_weight': 49,'subsample': 0.8,'colsample_bytree': 0.8}
95: 5
{'n_estimators': 528,'max_depth': 8,'min_child_weight': 1,'learning_rate': 0.008,'scale_pos_weight': 46,'subsample': 0.8,'colsample_bytree': 0.8}
Parameters when C data is divided by 80:20 and 95: 5
80:20
{'n_estimators': 361,'max_depth': 12,'min_child_weight': 4,'learning_rate': 0.005,'scale_pos_weight': 1,'subsample': 0.9,'colsample_bytree': 0.8}
95: 5
{'n_estimators': 469,'max_depth': 12,'min_child_weight': 6,'learning_rate': 0.005,'scale_pos_weight': 1,'subsample': 0.9,'colsample_bytree': 0.7}
Question details

Should I use parameters that have a high accuracy rate when comparing the data at 80:20 and 95: 5?

I feel that 95: 5 is better because it can learn more train data, but 80:20 can verify more test data, so I wonder if it is stable. .. .. ..

We would appreciate it if you could tell us how to adjust the parameters during actual operation.

▼ Environment etc.
Windows 10
python 3.7
Machine learning XGBoost