Random forest is used in Python scikit-learn. I trained RandomForestRegressor with bootstrap = False, but bootstrap = True is more accurate. In machine learning, it was generally recognized that the more training samples there are, the better the accuracy of the model, but is it possible that the accuracy will decrease like this time? Also, I would appreciate it if you could let me know the references that correspond to it. Thank you.
Answer # 1
Misunderstanding about 3 pointsI think there is a point that has been done.
The first point.
It is not accurate to classify bootstrap = False as a random forestI think. Random forest, a machine learning method, features bootstrap sampling as one of its features to improve accuracy. Just because you could set bootstrap = False with the scikit-learn option, that'sCan no longer be called a "random forest".. Also, from this, I feel it (I'll say it a little more properly later).You can see that "bootstrap = True is more accurate" is a matter of course. It's a random forestBecause it is a "ingenuity" of.
The second point.
by bootstrap = TrueIt is not accurate to be interpreted as reducing "learning samples"I think. One decision tree is obtained from randomly sampled training data, but in the end, in the ensemble state,I'm using all the original training data..
The third point.
It is also not accurate that the number of training samples alone determines the accuracy of machine learning.is. Machine learning is affected by various factors such as feature selection method, preprocessing, algorithm, etc., and results in accuracy. In the second discussion, I mentioned that the learning sample has not decreased, but apart from that, Random ForestObtaining a relatively smooth model by ensemble each decision treedoing. As a result, generalization performance = accuracy at the time of inference for unknown data is improved.
Reference: Random Forest
- python 3x - how to delete images dragged and dropped into gui created with wxpython
- python - trained model accuracy judgment (test data) program
- python - about the reason why the notation of the sample of the decision tree visualized by graphviz changes depending on the pr
- python - evaluation of accuracy of k-nearest neighbor method for time series data
- python - inconsistency in sample size of cnn machine learning with keras
- python - machine learning training/verification data, test data accuracy rate and adjustment