Home>

gentlemen, that's what's: I get a mistake when dividing the dataset to the test and training samples (here is going to apply the regression method). The error hints to empty values ​​(although the dataset is visible -the code displays it):

with n_samples= 0, test_size= 0.2 and train_size= None, The Resulting Train Set Will Be Empty. ADJUST Any Of The AFOREMENTED PARAMETERS

Data themselves taken from here: (TRAIN.CSV).

went easy: did not look at how to replace Nan in columns (although at first there was a thought to fill out with median values ​​and even the implementation of this). I did primitive -deleted the lines with the pass. Received a variable data_filtred. Next, the variables X and Y:

x= data_filtred [factor_feat.columns] # Determine the variables x and y
Y= Data_Filtred ['SalePrice']

What is Factor_Feat? This is a concatenation of columns of numerical and categorical signs.

At the next stage, it started separating the sample on TRAIN and TEST. And here the error appeared. If I look at the dimension of variables, I see: in the case of x it is (0, 79), the size y is (0,). When I check the variable Y, it gives an empty list:

Here is empty -Data_Filtred ['SalePrice'],

There are values ​​-Data ['SalePrice'].

Here is the code:

# Delete empty strings with Nan record
data_filtred= data.dropna ()
# Find categorical signs
cat_feat= list (data_filtred.dtypes [data_filtred.dtypes== Object] .INDEX)
# Filter Continuous Signs
num_feat= [f for f in data_filtred IF F NOT IN (Cat_feat + ['id', 'Saleprice'])]
# Look at how many values ​​we have for each categorical sign
cat_nunique= data_filtred [cat_feat] .nunique () # nunique () Returns the number of unique objects
Print (Cat_Nunique) # strange, but we see the zero unique values ​​for this print; Perhaps it makes sense to abandon further use in learning categorical signs
Factor_feat= Pd.Concat ((data_filtred [num_feat], data_filtred [cat_feat]), axis= 1) # Moving the Data frame columns -categorical
                                                                  # Signs and continuous
X= Data_Filtred [Factor_feat.columns] # Determine the variables X and Y
Y= Data_Filtred ['SalePrice']
# Smash on TRAIN /TEST
D_TRAIN, D_TEST, Y_TRAIN, Y_TEST= TRAIN_TEST_SIZE= 0.2, RANDOM_STATE= 42)

"I did primitive -deleted the lines with the pass." Well, it means there in all lines there are skipping in some columns and you have 0 lines in Data_Filtred. Well, either somewhere further. Print everywhere .Shape from the received data and you will know at what stage you have empty data turned out

CrazyElf2021-06-05 08:36:35

@Crazyelf, the rod column of numeric signs replaced the median value on the column, however, when learning the model, I receive an error: INPUT Contains Nan, Infinity or a Value Too Large for DType ('Float64') -the input contains NaN, infinity or value, Too much for DType ('Float64') What can this mean? Thank you

Алексей Казанцев2021-06-05 09:20:32

To begin with, check whether there is really in your Na: Print (X.isna (). SUM (). SUM ()), still look at the X.Describe () statistics, well, x.info () to heap.

CrazyElf2021-06-05 10:21:28