Home>
I have a question about the following code.
import numpy as np
from sklearn.model_selection import KFold
x = np.array ([[1, 2], [3, 4], [1, 2], [3, 4], [3, 4], [3, 4], [3, 4], [ 3, 4], [3, 4], [3, 4]])
y = np.array ([1, 1, 1, 0, 0, 0, 0, 1, 1, 1])
kf = KFold (n_splits = 4)
for train_idx, test_idx in kf.split (x, y): #
print ("train_idx:", train_idx, "test_idx:", test_idx)
As in "" of this code, the array of (x) and (y) is put in the argument of kf.split. Is this y necessary? The same is true for StratifiedKFold. The objective variable y is set with x, so if you select an index for x, y must be an index corresponding to it. In fact, the official document defaults to the argument y being none, and the code at hand is
for train_idx, test_idx in kf.split (x): #
print ("train_idx:", train_idx, "test_idx:", test_idx)
This will appear to work correctly.
However, there are many sample codes that include both explanatory variables and objective variables as arguments.
Even the reference book is confused because it is only one or both depending on the book.
Why don't you just split it with explanatory variables?
-
Answer # 1
Related articles
- about the arguments of python beautifulsoup find_all
- about python arguments
- i have a question about basic python problems
- python - what i don't understand about yolo9000 (v2)
- python - about write loop to csv
- about python argument and data definition
- python 3x - about downloading anaconda
- python - about the optimum angle of rotation matrix
- python - about downloading youtube videos by youtube-dl
- about processing to exclude the character string group specified from list in python
- about the python speedtest code
- about batch change of file name using python
- python - meaning of dlib face detection function arguments
- about the implementation of combinations in python
- please tell me about the role of python tag = "mychr"
- about python def issues
- [python] i want to specify arguments in the code in advance in a script to be executed on the command line [youtube data api]
- about the operation of python's speedtest module (library)
- python - about hamiltonian neural networks
- python, about the fire spread step of forest fire simulation
Trends
In the case of
KFold.split
,y
is treated as an option. It works even if you don't write it, but it's better to write it from the viewpoint of readability. Also, for reasons described later (required forStratifiedKFold.split
), those who write can easily swapKFold
andStratifiedKFold
( Compatible code).sklearn.model_selection.KFold — scikit-learn 0.21.3 documentation
is a required positional argument in
StratifiedKFold.split
. AsStratification is done based on the y labels.
, it means that Stratification (stratification if translated into Japanese) is not possible withouty
.sklearn.model_selection.StratifiedKFold — scikit-learn 0.21.3 documentation