After this code:
import locale
import sys
from locale import atof
locale.setlocale (locale.LC_NUMERIC, '')
'en_GB.UTF-8'
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
col_names= ['Project', 'OrderDate', 'orderid', 'ClientID', 'IsRepeat', 'IsBlocked', 'IsManual', 'AutoDecision', 'ManualApprove', 'IsLoan', 'ShortTermAmount', 'ShortTermPeriod' , 'LongTermAmount', 'LongTermPeriod', 'RequestedAmount', 'RequestedPeriod', 'LoanSum', 'Period', 'ShortTermScore', 'LongTermScore']
dtypes= {"Project": bool, "OrderDate": 'str', "orderid": "Int64", "ClientID": "Int64", "IsRepeat": bool, "IsBlocked": bool, "IsManual": bool , "AutoDecision": bool, "ManualApprove": bool, "IsLoan": bool, "ShortTermAmount": "Int64", "ShortTermPeriod": "Int64", "LongTermAmount": "Int64", "LongTermPeriod": "Int64" , "RequestedAmount": "Int64", "RequestedPeriod": "Int64", "LoanSum": "Int64", "Period": "Int64", "ShortTermScore": "float64", "LongTermScore": "float64"}
parse_dates= ['OrderDate']
test= pd.read_csv ("/home /man /Test_task.csv", sep= ',', thousands= ',', header= None, names= col_names, dtype= dtypes, parse_dates= parse_dates, converters= {'Project ': lambda x: bool (str (x)) if x!=' -'else np.nan,' IsRepeat ': lambda x: bool (str (x)) if x!=' -'else np.nan, 'IsBlocked': lambda x: bool (str (x)) if x!= '-' else np.nan, 'IsManual': lambda x: bool (str (x)) if x!= '-' else np. nan, 'AutoDecision': lambda x: bool (str (x)) if x!= '-' else np.nan, 'ManualApprove': lambda x: bool (str (x)) if x!= '-' else np.nan, 'IsLoan': lambda x: bool (str (x)) if x!= '-' else np.nan})
df= pd.DataFrame (data= test)
test.head ()
with data from this csv -table: https://drive.google.com/file/d/1Oseh4KnE98tC3-jRyqWI2Ogr6usoOlvd/view
my table title is printed, but then, when I try to act on table elements as numbers, I get an error that it is impossible to act like this on rows, despite the fact that pandas recognizes the dtype of all columns as "object", but, judging by everything, generally like "str".
Does the table have some hidden symbols and is it corrupted?
UPDATE: following MaxU's advice, I've added padding spaces in sep. It got better, the columns LoanSum, Period, ShortTermScore, LongTermScore finally began to be recognized as float64, however, now these columns instead of numbers are NaN, and other numeric columns are still of the object type (albeit with the correct numbers).
Please clarify what you mean. Is the CSV file formatted incorrectly?
Timur2021-02-23 18:30:40I mean, you use tab (sep= '\ t') as a field separator in your code, and a different separator is used in the CSV file ...
MaxU2021-02-23 18:30:40I slightly changed the question to the actual one
Timur2021-02-23 18:30:40I read your post again, and added padding spaces in sep. It got better, the columns LoanSum, Period, ShortTermScore, LongTermScore are finally recognized as float64, however, now these columns contain NaN instead of numbers. And other numeric columns are still of type object
Timur2021-02-23 18:30:40- python : Error reading csv
- python : Comparison of CSV data of two files [Duplicate /Add line /Blank]
- python : Import from CSV file data to SF in JSON format
- python : How to read Excel spreadsheets with a compound title?
- python : How to conditionally replace values in a DataFrame column with values from another DataFrame
- python : Get elements of a specific column of a dataframe
- python : How does sort_values work for multiple columns in Pandas?
- python : How to change column and label values in txt file?
- python : How to change column type in DataFrame
- python : Merge two columns
you have a data delimiter -a comma, surrounded by spaces ...
MaxU2021-02-23 18:30:40