There are readymade TensorFlow methods that implement that and other functionality. However, I am trying to see how it can be done with hands for the logistic regression model (classification). I use the IRIS dataseset, leaving 2 classes.
implemented training with a gradient descent, but more complex heuristics (Nesterov Momentum and RMSPROP) are not amenable. To implement them, I need to change the Learn_SGD function from the code below
gradient descent made as follows
Import Pandas AS PD
Import NUMPY AS NP
From Sklearn.DataSets Import Load_iris
X, Y= Load_iris (Return_x_y= True)
Filter= Y!= 2 # Apply the filter only 2 signs
X= x [Filter]
Y= Y [Filter]
Coefs= np.random.randn (5) # Generate random coefficients
DEF PREDICT_PROBA (COEFS, X):
# Logistic regression formula:
Return 1. /(1. + np.exp ((X.DOT (COEFS [: 4]) + COEFS [1])))
# Now based on the model predict class (but the model is not trained yet:
# This will make gradient descendant a little later)
DEF PREDICT_CLASS (COEFS, X):
Probas= Predict_Proba (Coefs, X)
Return (Probas >
0.5) .ASType (NP.Float)
# Clearly prescribe a loss function based on its formula
DEF BCE_LOSS (COEFS, X, Y):
Probas= Predict_Proba (Coefs, X)
Filter_ones= Y== 1
loss= 1. * (NP.SUM (NP.LOG) + NP.SUM (NP.LOG (1. Probas [~ Filter_ones]))) /len (y)
Return Loss
# Calculation of the gradient:
# It depends on two entities: from the model and loss functions
DEF GRAD (COEFS, X, Y):
Probas= Predict_Proba (Coefs, X)
Delta= Probas Y
modified_x= x.t * delta
Deltas= NP.Mean (modified_x, axis= 1)
Return Deltas, NP.Mean (DELTA)
# Training model using gradient descent
Def Learn_SGD (COEFS, X, Y, NUM_EPOCHS= 20, Learning_Rate= 0.0001):
losses= []
For E In Range (Num_epochs):
GRAD_COEFS, GRAD_BIAS= GRAD (COEFS, X, Y)
COEFS [: 1]= COEFS [: 1] Learning_Rate * GRAD_COEFS
COEFS [1]= COEFS [1] Learning_Rate * Grad_bias
Loss= BCE_LOSS (COEFS, X, Y)
Losses.append (Loss)
Return Losses, Coefs

Answer # 1

Answer # 2
The general direction about Nesterov Momentum has determined. Let me remind you that the method is the momentum (Momentum) and the calculation of the gradient for the next point of the schedule. Here is the general formula: Nesterov Momentum v_t= Momentum * v_t {T1} + Learning_rate * Grad (Coefs Momentum V_T {T1}), where COEFS= COEFS {T1} V_T. We see that the speed at the point T is correlated at a speed at the point T1.
There are difficulties. The interpreter swars on the coefs string [i] = v_t [i]. The following is written: Setting An Array Element with a Sequence.
Def Learn_nesterov (COEFS, X, Y, NUM_EPOCHS= 20, MOMENTUM= 0.9, Learning_Rate= 0.0001): v_t= [0 for _ in Range (Len (COEFS))] losses= [] For it in Range (Num_epochs): PR_COEFS= [COEFS [I] Momentum * V_T [i] for i in Range (LEN (COEFS)] gr_coefs= grad (pr_coefs, x, y) For i in Range (Len (Coefs)): V_T [i]= Momentum * V_T [i] + Learning_rate * Gr_Coefs [i] Coefs [i]= Coefs [i] v_t [i] losses.append (BCE_LOSS (X, Y, COEFS)) Return Losses, Coefs Learn_nesterov (Coefs, X, Y)
The second RMSPROP method has done so (see below). Errors with dimension. The following summary is derived: shapes (100,) and (4,4) not aligned: 100 (dim 0)!= 4 (dim 0)
Def Learn_RMSPROP (COEFS, X, Y, NUM_EPOCHS= 20, MOMENTUM= 0.9, Learning_Rate= 0.0001): E= 10 ** (8) S= [0 FOR _ IN RANGE (LEN (COEFS))] losses= [] For it in Range (Num_epochs): gr_coefs= grad (x, y, coefs) gr_coefs_2= [x ** 2 for x in gr_coefs] For i in Range (Len (Coefs)): S [i]= Momentum * S [i] + (1Momentum) * Gr_coefs_2 [i] COEFS [I]= COEFS [i] Lerarning_rate * (Math.SQRT (S [I]) + E)) losses.append (BCE_LOSS (COEFS, X, Y)) Return Losses, Coefs Learn_RMSPROP (COEFS, X, Y)
 python : NVIDIA RTX A6000 can not use TensorflowGPU 1.13.1
 python : When using TENSORFLOW in Django, I want to make it possible to use the model that was built and loaded once while start
 python : TENSORFLOW NO MODULE NAMED "_PYWRAP_TENSORFLOW_INTERNAL"
 In Python Resulting Train Set Will Be Empty: How to "repair" sample size?
The general direction about Nesterov Momentum has determined. Let me remind you that the method is the momentum (Momentum) and the calculation of the gradient for the next point of the schedule. Here is the general formula: Nesterov Momentum v_t= Momentum * v_t {T1} + Learning_rate * Grad (Coefs Momentum V_T {T1}), where COEFS= COEFS {T1} V_T. We see that the speed at the point T is correlated at a speed at the point T1.
There are difficulties. The interpreter swars on the coefs string [i] = v_t [i]. The following is written: Setting An Array Element with a Sequence.
The second RMSPROP method has done so (see below). Errors with dimension. The following summary is derived: shapes (100,) and (4,4) not aligned: 100 (dim 0)!= 4 (dim 0)