Random forest is used in Python scikit-learn. I trained RandomForestRegressor with bootstrap = False, but bootstrap = True is more accurate. In machine learning, it was generally recognized that the more training samples there are, the better the accuracy of the model, but is it possible that the accuracy will decrease like this time? Also, I would appreciate it if you could let me know the references that correspond to it. Thank you.

  • Answer # 1

    Misunderstanding about 3 pointsI think there is a point that has been done.

    The first point.

    It is not accurate to classify bootstrap = False as a random forestI think. Random forest, a machine learning method, features bootstrap sampling as one of its features to improve accuracy. Just because you could set bootstrap = False with the scikit-learn option, that'sCan no longer be called a "random forest".. Also, from this, I feel it (I'll say it a little more properly later).You can see that "bootstrap = True is more accurate" is a matter of course. It's a random forestBecause it is a "ingenuity" of.

    The second point.

    by bootstrap = TrueIt is not accurate to be interpreted as reducing "learning samples"I think. One decision tree is obtained from randomly sampled training data, but in the end, in the ensemble state,I'm using all the original training data..

    The third point.

    It is also not accurate that the number of training samples alone determines the accuracy of machine learning.is. Machine learning is affected by various factors such as feature selection method, preprocessing, algorithm, etc., and results in accuracy. In the second discussion, I mentioned that the learning sample has not decreased, but apart from that, Random ForestObtaining a relatively smooth model by ensemble each decision treedoing. As a result, generalization performance = accuracy at the time of inference for unknown data is improved.

    Reference: Random Forest