Home>

After this code:

import locale
    import sys
    from locale import atof
    locale.setlocale (locale.LC_NUMERIC, '')
    'en_GB.UTF-8'
    import pandas as pd
    import numpy as np
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.model_selection import train_test_split
    from sklearn import metrics
col_names= ['Project', 'OrderDate', 'orderid', 'ClientID', 'IsRepeat', 'IsBlocked', 'IsManual', 'AutoDecision', 'ManualApprove', 'IsLoan', 'ShortTermAmount', 'ShortTermPeriod' , 'LongTermAmount', 'LongTermPeriod', 'RequestedAmount', 'RequestedPeriod', 'LoanSum', 'Period', 'ShortTermScore', 'LongTermScore']
dtypes= {"Project": bool, "OrderDate": 'str', "orderid": "Int64", "ClientID": "Int64", "IsRepeat": bool, "IsBlocked": bool, "IsManual": bool , "AutoDecision": bool, "ManualApprove": bool, "IsLoan": bool, "ShortTermAmount": "Int64", "ShortTermPeriod": "Int64", "LongTermAmount": "Int64", "LongTermPeriod": "Int64" , "RequestedAmount": "Int64", "RequestedPeriod": "Int64", "LoanSum": "Int64", "Period": "Int64", "ShortTermScore": "float64", "LongTermScore": "float64"}
parse_dates= ['OrderDate']
test= pd.read_csv ("/home /man /Test_task.csv", sep= ',', thousands= ',', header= None, names= col_names, dtype= dtypes, parse_dates= parse_dates, converters= {'Project ': lambda x: bool (str (x)) if x!=' -'else np.nan,' IsRepeat ': lambda x: bool (str (x)) if x!=' -'else np.nan, 'IsBlocked': lambda x: bool (str (x)) if x!= '-' else np.nan, 'IsManual': lambda x: bool (str (x)) if x!= '-' else np. nan, 'AutoDecision': lambda x: bool (str (x)) if x!= '-' else np.nan, 'ManualApprove': lambda x: bool (str (x)) if x!= '-' else np.nan, 'IsLoan': lambda x: bool (str (x)) if x!= '-' else np.nan})
df= pd.DataFrame (data= test)
test.head ()

with data from this csv -table: https://drive.google.com/file/d/1Oseh4KnE98tC3-jRyqWI2Ogr6usoOlvd/view

my table title is printed, but then, when I try to act on table elements as numbers, I get an error that it is impossible to act like this on rows, despite the fact that pandas recognizes the dtype of all columns as "object", but, judging by everything, generally like "str".

Does the table have some hidden symbols and is it corrupted?

UPDATE: following MaxU's advice, I've added padding spaces in sep. It got better, the columns LoanSum, Period, ShortTermScore, LongTermScore finally began to be recognized as float64, however, now these columns instead of numbers are NaN, and other numeric columns are still of the object type (albeit with the correct numbers).

you have a data delimiter -a comma, surrounded by spaces ...

MaxU2021-02-23 18:30:40

Please clarify what you mean. Is the CSV file formatted incorrectly?

Timur2021-02-23 18:30:40

I mean, you use tab (sep= '\ t') as a field separator in your code, and a different separator is used in the CSV file ...

MaxU2021-02-23 18:30:40

I slightly changed the question to the actual one

Timur2021-02-23 18:30:40

I read your post again, and added padding spaces in sep. It got better, the columns LoanSum, Period, ShortTermScore, LongTermScore are finally recognized as float64, however, now these columns contain NaN instead of numbers. And other numeric columns are still of type object

Timur2021-02-23 18:30:40