Home>

This is the code after creating the machine learning model.

i = 0
res = []
step_size = 50000
for j in tqdm (range (int (np.ceil (test_df.shape [0]/50000)))):
    res.append (np.expm1 (sum ([model.predict (test_df.iloc [i: i + step_size]) for model in models])/folds))
    i + = step_size

I understand that the objective variable of the file for submission is calculated following this code.
However, I am not sure what kind of calculation is performed in the for statement of this code.
The evaluation index is RMSLE.

Why divide the number of test tables by 50000?
range (int (np.ceil (test_df.shape [0]/50000)))
Also, what kind of calculation does the following code do after that?
np.expm1 (sum ([model.predict (test_df.iloc [i: i + step_size]) for model in models])/folds)

I understand the meaning of the functions in the code.
test_df.iloc [i: i + step_size]
↑ I can't go ahead because I don't know what kind of data is being taken here and what it means.

test_df: Test objective variable data frame
models: Array of models
folds: folrd number = 5

If it is difficult to understand with the code extracted from the

part, I will add it.

  • Answer # 1

      

    Why is the number of test tables divided by 50000?

    I can't make an assertion by looking at this, but I think 50000 isstep_size. Rewriting improves readability.

      

    I understand the meaning of the functions in the code.
      test_df.iloc [i: i + step_size]
      ↑ I can't go ahead because I don't know what kind of data is being taken here and what it means.

    This is a slice.test_dffrom lineito linei + step_size-1(wherei + step_size(Not included) is entered into the model.
    In short, since it takes time to predict the test data one by one, we predict it to some extent. (Maybe)

Related articles