Home>

I have deep learning and I have questions.

1. Since the optimum weight parameter is the parameter that minimizes loss and val_loss, if val_loss is rampant as shown in the figure below, is it better to pick out the parameter when val_loss falls by chance?

loss and val_loss are defined as follows.
loss: train data answer and prediction error mse
val_loss: test data answer and prediction error mse

2. Why is val_loss so rampage?
It is strange that the deep learning that I have been doing so far has only a pattern in which the loss and val_loss generally fall in minor.

  • Answer # 1

      

    1. Since the optimum weight parameter is the parameter that minimizes loss and val_loss, if val_loss is rampant as shown in the figure below, is it better to pick out the parameter when val_loss falls by chance?

    This depends on the situation, and sometimes it's better or worse. But it ’s better to manage the rampage first.

    I think that it will not be useful unless we estimate the required performance in advance and learn to meet that performance stably.

      

    2. Why is val_loss so rampage?

    If there is little validation data, the range of fluke/out of contact will be larger. After that, when the validation data is not the same as the learning data.

    If you are simply unable to learn well, you need to review the model. The optimizer can manage it, but it can also be a problem that cannot be solved in the first place.

  • Answer # 2

    1, I think that the smaller the val_loss, the more reliable the model is (it is better to take out the parameters when the val_loss drops).

    2, I can't affirm because there is no code, but it is probably a problem of learning rate.
    Increasing the learning rate increases the possibility of divergence instead of faster learning.
    If the learning rate is reduced, the learning speed becomes slower at the expense of convergence.

  • Answer # 3

    1. If val_loss fluctuates greatly, parameter optimization does not work in the first place, so it makes no sense to extract parameters when val_loss is small.

Related articles