gentlemen, that's what's: I get a mistake when dividing the dataset to the test and training samples (here is going to apply the regression method). The error hints to empty values (although the dataset is visible -the code displays it):
with n_samples= 0, test_size= 0.2 and train_size= None, The Resulting Train Set Will Be Empty. ADJUST Any Of The AFOREMENTED PARAMETERS
Data themselves taken from here: (TRAIN.CSV).
went easy: did not look at how to replace Nan in columns (although at first there was a thought to fill out with median values and even the implementation of this). I did primitive -deleted the lines with the pass. Received a variable data_filtred. Next, the variables X and Y:
x= data_filtred [factor_feat.columns] # Determine the variables x and y Y= Data_Filtred ['SalePrice']
What is Factor_Feat? This is a concatenation of columns of numerical and categorical signs.
At the next stage, it started separating the sample on TRAIN and TEST. And here the error appeared. If I look at the dimension of variables, I see: in the case of x it is (0, 79), the size y is (0,). When I check the variable Y, it gives an empty list:
Here is empty -Data_Filtred ['SalePrice'],
There are values -Data ['SalePrice'].
Here is the code:
# Delete empty strings with Nan record data_filtred= data.dropna () # Find categorical signs cat_feat= list (data_filtred.dtypes [data_filtred.dtypes== Object] .INDEX) # Filter Continuous Signs num_feat= [f for f in data_filtred IF F NOT IN (Cat_feat + ['id', 'Saleprice'])] # Look at how many values we have for each categorical sign cat_nunique= data_filtred [cat_feat] .nunique () # nunique () Returns the number of unique objects Print (Cat_Nunique) # strange, but we see the zero unique values for this print; Perhaps it makes sense to abandon further use in learning categorical signs Factor_feat= Pd.Concat ((data_filtred [num_feat], data_filtred [cat_feat]), axis= 1) # Moving the Data frame columns -categorical # Signs and continuous X= Data_Filtred [Factor_feat.columns] # Determine the variables X and Y Y= Data_Filtred ['SalePrice'] # Smash on TRAIN /TEST D_TRAIN, D_TEST, Y_TRAIN, Y_TEST= TRAIN_TEST_SIZE= 0.2, RANDOM_STATE= 42)
- How to split 2D array in Python?
- python : How to translate the NUMPY INT64 values in JSON?
- I do not know how to deal with ValueError: Length Mismatch at Python.
- python : How in Pandas based on DateTimeIndex one array get values from another array focusing on his DateTimeIndex
- In Python TFIDF TWIDF does not accept the to_NumpY () method
- python : Pandas: Delete empty value in the Data Prase
- python : The np.hstack () method "plays" with the dimension of frames on rows -instead of 88 lines at the output 84
- pandas : I am trying to convert categorical variables using oneHotencoder, but gives an error
- Python Pandas Opening XLS File
- python : How to pass the average value of each column in the DataFrame with the replacement of NAN?