I have deep learning and I have questions.
1. Since the optimum weight parameter is the parameter that minimizes loss and val_loss, if val_loss is rampant as shown in the figure below, is it better to pick out the parameter when val_loss falls by chance?
loss and val_loss are defined as follows.
loss: train data answer and prediction error mse
val_loss: test data answer and prediction error mse
2. Why is val_loss so rampage?
It is strange that the deep learning that I have been doing so far has only a pattern in which the loss and val_loss generally fall in minor.

Answer # 1

Answer # 2
1, I think that the smaller the val_loss, the more reliable the model is (it is better to take out the parameters when the val_loss drops).
2, I can't affirm because there is no code, but it is probably a problem of learning rate.
Increasing the learning rate increases the possibility of divergence instead of faster learning.
If the learning rate is reduced, the learning speed becomes slower at the expense of convergence. 
Answer # 3
1. If val_loss fluctuates greatly, parameter optimization does not work in the first place, so it makes no sense to extract parameters when val_loss is small.
Related articles
 python  fitting with two parameters using an expression containing an integral function
 only part of the data is recognized when learning python
 how to apply machine learning (svm) when a list is included in a python explanatory variable (parameter)
 python  about operation parameters in machine learning production
 python  about formal parameters in the definition
 python  about the ratio of machine learning training data, validation data, and test data
 python  machine learning training/verification data, test data accuracy rate and adjustment
 python  about learning mnist data with keras
 python 3x  python keras about the shape of learning data
 optimal description of python def function
 python  learning is forcibly terminated without displaying an error
 python  about errors in deep learning
 parameters  about error when calculating python sarimx parameter optimization
 python  machine learning feature extraction of time series data
 python  convnet weight update error
 python  inconsistency in sample size of cnn machine learning with keras
 python  reinforcement learning of tag does not go well with open ai gym
 python  abnormal termination in machine learning using jupyter notebook
 python  how to solve errors when learning with svm
 python  about deep learning programs using keras
This depends on the situation, and sometimes it's better or worse. But it ’s better to manage the rampage first.
I think that it will not be useful unless we estimate the required performance in advance and learn to meet that performance stably.
If there is little validation data, the range of fluke/out of contact will be larger. After that, when the validation data is not the same as the learning data.
If you are simply unable to learn well, you need to review the model. The optimizer can manage it, but it can also be a problem that cannot be solved in the first place.