I build a linear regression model. Buing a graph of predicted results, I noticed that the regression values ​​lie almost lower than the true. Those. By adding a constant to the values ​​of the model, I actually "Improve" the result. What causes could be?

from sklearn.model_selection import train_test_split
From Sklearn.PreProcessing Import Minmaxscaler, Standardscaler
Scaler= minmaxscaler ()
X_train_scaled= scaler.fit_transform (x_train)
X_test_scaled= scaler.transform (X_TEST)
Poly= PolynomialFeatures (degree= 2, interaction_only= true) .fit (x_train_scaled)
X_Test_Poly= Poly.Transform (x_test_scaled)
RDG= linear_model.ridge (Alpha= 0.01)
y_test_predict= rdg.predict (x_test_scaled)
y_train_predict= rdg.predict (x_train_scaled)
Print ('R ^ 2 TRAINING:% .3F, R ^ 2 Test:% .3f'% (
      (Metrics.R2_Score (Y_TRAIN, Y_TRAIN_PREDICT)))
      (Metrics.R2_Score (Y_TEST, Y_TEST_PREDICT))))
rdg_metrics= get_regression_metrics ('ridge regression', y_test, y_test_predict)

You need to see at least a piece of data. It is not clear what exactly you draw on the graph. Metrics should be printed -you also did not show them. The partition of data on the train and the test is also not shown. But it is shown how you make polynomial features ... which are not used in the future. It is necessary to show only a relevant code code.

CrazyElf2021-07-21 17:47:23

And in general -what the data? Maybe you are temporary rows predict, then it is generally another story, it can easily be there if you predict the current price at prices a few days ago. If there is a trend, then down, up, the prediction will averaged the data. If the trend is boosted on the test, as on the chart, the prediction will be late for the trend and underestimate the forecast.

CrazyElf2021-07-21 17:49:45