Home>

### Manually train the model by Nesterov Momentum and Rmsprop on Python

There are ready-made TensorFlow methods that implement that and other functionality. However, I am trying to see how it can be done with hands for the logistic regression model (classification). I use the IRIS dataseset, leaving 2 classes.

implemented training with a gradient descent, but more complex heuristics (Nesterov Momentum and RMSPROP) are not amenable. To implement them, I need to change the Learn_SGD function from the code below

gradient descent made as follows

``````Import Pandas AS PD
Import NUMPY AS NP
From Sklearn.DataSets Import Load_iris
X, Y= Load_iris (Return_x_y= True)
Filter= Y!= 2 # Apply the filter -only 2 signs
X= x [Filter]
Y= Y [Filter]
Coefs= np.random.randn (5) # Generate random coefficients
DEF PREDICT_PROBA (COEFS, X):
# Logistic regression formula:
Return 1. /(1. + np.exp (-(X.DOT (COEFS [: 4]) + COEFS [-1])))
# Now based on the model predict class (but the model is not trained yet:
# This will make gradient descendant a little later)
DEF PREDICT_CLASS (COEFS, X):
Probas= Predict_Proba (Coefs, X)
Return (Probas >
0.5) .ASType (NP.Float)
# Clearly prescribe a loss function based on its formula
DEF BCE_LOSS (COEFS, X, Y):
Probas= Predict_Proba (Coefs, X)
Filter_ones= Y== 1
loss= -1. * (NP.SUM (NP.LOG) + NP.SUM (NP.LOG (1. -Probas [~ Filter_ones]))) /len (y)
Return Loss
# Calculation of the gradient:
# It depends on two entities: from the model and loss functions
DEF GRAD (COEFS, X, Y):
Probas= Predict_Proba (Coefs, X)
Delta= Probas -Y
modified_x= x.t * delta
Deltas= NP.Mean (modified_x, axis= 1)
Return Deltas, NP.Mean (DELTA)
# Training model using gradient descent
Def Learn_SGD (COEFS, X, Y, NUM_EPOCHS= 20, Learning_Rate= 0.0001):
losses= []
For E In Range (Num_epochs):
GRAD_COEFS, GRAD_BIAS= GRAD (COEFS, X, Y)
COEFS [: -1]= COEFS [: -1] -Learning_Rate * GRAD_COEFS
COEFS [-1]= COEFS [-1] -Learning_Rate * Grad_bias
Loss= BCE_LOSS (COEFS, X, Y)
Losses.append (Loss)
Return Losses, Coefs
``````
• Answer # 1

The general direction about Nesterov Momentum has determined. Let me remind you that the method is the momentum (Momentum) and the calculation of the gradient for the next point of the schedule. Here is the general formula: Nesterov Momentum v_t= Momentum * v_t {T-1} + Learning_rate * Grad (Coefs -Momentum V_T {T-1}), where COEFS= COEFS {T-1} -V_T. We see that the speed at the point T is correlated at a speed at the point T-1.

There are difficulties. The interpreter swars on the coefs string [i] -= v_t [i]. The following is written: Setting An Array Element with a Sequence.

``````Def Learn_nesterov (COEFS, X, Y, NUM_EPOCHS= 20, MOMENTUM= 0.9, Learning_Rate= 0.0001):
v_t= [0 for _ in Range (Len (COEFS))]
losses= []
For it in Range (Num_epochs):
PR_COEFS= [COEFS [I] -Momentum * V_T [i] for i in Range (LEN (COEFS)]
gr_coefs= grad (pr_coefs, x, y)
For i in Range (Len (Coefs)):
V_T [i]= Momentum * V_T [i] + Learning_rate * Gr_Coefs [i]
Coefs [i]= Coefs [i] -v_t [i]
losses.append (BCE_LOSS (X, Y, COEFS))
Return Losses, Coefs
Learn_nesterov (Coefs, X, Y)
``````

The second RMSPROP method has done so (see below). Errors with dimension. The following summary is derived: shapes (100,) and (4,4) not aligned: 100 (dim 0)!= 4 (dim 0)

``````Def Learn_RMSPROP (COEFS, X, Y, NUM_EPOCHS= 20, MOMENTUM= 0.9, Learning_Rate= 0.0001):
E= 10 ** (-8)
S= [0 FOR _ IN RANGE (LEN (COEFS))]
losses= []
For it in Range (Num_epochs):
gr_coefs= grad (x, y, coefs)
gr_coefs_2= [x ** 2 for x in gr_coefs]
For i in Range (Len (Coefs)):
S [i]= Momentum * S [i] + (1-Momentum) * Gr_coefs_2 [i]
COEFS [I]= COEFS [i] -Lerarning_rate * (Math.SQRT (S [I]) + E))
losses.append (BCE_LOSS (COEFS, X, Y))
Return Losses, Coefs
Learn_RMSPROP (COEFS, X, Y)
``````
• Answer # 2

The general direction about Nesterov Momentum has determined. Let me remind you that the method is the momentum (Momentum) and the calculation of the gradient for the next point of the schedule. Here is the general formula: Nesterov Momentum v_t= Momentum * v_t {T-1} + Learning_rate * Grad (Coefs -Momentum V_T {T-1}), where COEFS= COEFS {T-1} -V_T. We see that the speed at the point T is correlated at a speed at the point T-1.

There are difficulties. The interpreter swars on the coefs string [i] -= v_t [i]. The following is written: Setting An Array Element with a Sequence.

``````Def Learn_nesterov (COEFS, X, Y, NUM_EPOCHS= 20, MOMENTUM= 0.9, Learning_Rate= 0.0001):
v_t= [0 for _ in Range (Len (COEFS))]
losses= []
For it in Range (Num_epochs):
PR_COEFS= [COEFS [I] -Momentum * V_T [i] for i in Range (LEN (COEFS)]
gr_coefs= grad (pr_coefs, x, y)
For i in Range (Len (Coefs)):
V_T [i]= Momentum * V_T [i] + Learning_rate * Gr_Coefs [i]
Coefs [i]= Coefs [i] -v_t [i]
losses.append (BCE_LOSS (X, Y, COEFS))
Return Losses, Coefs
Learn_nesterov (Coefs, X, Y)
``````

The second RMSPROP method has done so (see below). Errors with dimension. The following summary is derived: shapes (100,) and (4,4) not aligned: 100 (dim 0)!= 4 (dim 0)

``````Def Learn_RMSPROP (COEFS, X, Y, NUM_EPOCHS= 20, MOMENTUM= 0.9, Learning_Rate= 0.0001):
E= 10 ** (-8)
S= [0 FOR _ IN RANGE (LEN (COEFS))]
losses= []
For it in Range (Num_epochs):
gr_coefs= grad (x, y, coefs)
gr_coefs_2= [x ** 2 for x in gr_coefs]
For i in Range (Len (Coefs)):
S [i]= Momentum * S [i] + (1-Momentum) * Gr_coefs_2 [i]
COEFS [I]= COEFS [i] -Lerarning_rate * (Math.SQRT (S [I]) + E))
losses.append (BCE_LOSS (COEFS, X, Y))
Return Losses, Coefs
Learn_RMSPROP (COEFS, X, Y)
``````