gentlemen, that's what's: I get a mistake when dividing the dataset to the test and training samples (here is going to apply the regression method). The error hints to empty values (although the dataset is visible -the code displays it):

with n_samples= 0, test_size= 0.2 and train_size= None, The Resulting Train Set Will Be Empty. ADJUST Any Of The AFOREMENTED PARAMETERS

Data themselves taken from here: (TRAIN.CSV).

went easy: did not look at how to replace Nan in columns (although at first there was a thought to fill out with median values and even the implementation of this). I did primitive -deleted the lines with the pass. Received a variable data_filtred. Next, the variables X and Y:

```
x= data_filtred [factor_feat.columns] # Determine the variables x and y
Y= Data_Filtred ['SalePrice']
```

What is Factor_Feat? This is a concatenation of columns of numerical and categorical signs.

At the next stage, it started separating the sample on TRAIN and TEST. And here the error appeared. If I look at the dimension of variables, I see: in the case of x it is (0, 79), the size y is (0,). When I check the variable Y, it gives an empty list:

Here is empty -Data_Filtred ['SalePrice'],

There are values -Data ['SalePrice'].

Here is the code:

```
# Delete empty strings with Nan record
data_filtred= data.dropna ()
# Find categorical signs
cat_feat= list (data_filtred.dtypes [data_filtred.dtypes== Object] .INDEX)
# Filter Continuous Signs
num_feat= [f for f in data_filtred IF F NOT IN (Cat_feat + ['id', 'Saleprice'])]
# Look at how many values we have for each categorical sign
cat_nunique= data_filtred [cat_feat] .nunique () # nunique () Returns the number of unique objects
Print (Cat_Nunique) # strange, but we see the zero unique values for this print; Perhaps it makes sense to abandon further use in learning categorical signs
Factor_feat= Pd.Concat ((data_filtred [num_feat], data_filtred [cat_feat]), axis= 1) # Moving the Data frame columns -categorical
# Signs and continuous
X= Data_Filtred [Factor_feat.columns] # Determine the variables X and Y
Y= Data_Filtred ['SalePrice']
# Smash on TRAIN /TEST
D_TRAIN, D_TEST, Y_TRAIN, Y_TEST= TRAIN_TEST_SIZE= 0.2, RANDOM_STATE= 42)
```

@Crazyelf, the rod column of numeric signs replaced the median value on the column, however, when learning the model, I receive an error: INPUT Contains Nan, Infinity or a Value Too Large for DType ('Float64') -the input contains NaN, infinity or value, Too much for DType ('Float64') What can this mean? Thank you

Алексей Казанцев2021-06-05 09:20:32To begin with, check whether there is really in your Na: Print (X.isna (). SUM (). SUM ()), still look at the X.Describe () statistics, well, x.info () to heap.

CrazyElf2021-06-05 10:21:28- How to split 2D array in Python?
- python : How to translate the NUMPY INT64 values in JSON?
- I do not know how to deal with ValueError: Length Mismatch at Python.
- python : How in Pandas based on DateTimeIndex one array get values from another array focusing on his DateTimeIndex
- In Python TFIDF TWIDF does not accept the to_NumpY () method
- python : Pandas: Delete empty value in the Data Prase
- python : The np.hstack () method "plays" with the dimension of frames on rows -instead of 88 lines at the output 84
- pandas : I am trying to convert categorical variables using oneHotencoder, but gives an error
- Python Pandas Opening XLS File
- python : How to pass the average value of each column in the DataFrame with the replacement of NAN?

"I did primitive -deleted the lines with the pass." Well, it means there in all lines there are skipping in some columns and you have 0 lines in Data_Filtred. Well, either somewhere further. Print everywhere .Shape from the received data and you will know at what stage you have empty data turned out

CrazyElf2021-06-05 08:36:35