Home>

We would appreciate it if you could give us an idea of ​​how to search for unexpected data in the data frame.

For example, if there is a small number of string type data in a column that would originally be composed of numeric type,
How can you extract that data?

df.dtypesCheck the data type of each column from, and it is not int type but object type,
I checked the original data directly and found unnatural data.

I would like to see the corresponding lines and data in the code.
Excuse me, but I would like to ask you to teach. Thanks for your cooperation.

  • Answer # 1

    The original CSV assumes that the first row is the header column.
    The following assumes that all data formats to be changed are numbers.

    import pandas as pd
    df_org = pd.read_csv ('data1.csv', header = 0)
    # Of the read csv, replace the missing value data with a character string on purpose. If you don't want to include missing values ​​in "strange values", you can just set this line to df_src = df_org.
    df_src = df_org.fillna ('__ NA__')
    #Convert data to numbers.
    cols = df_src.columns
    df_dest = df_src [cols] .apply (pd.to_numeric, errors ='coerce')
    # Create a data frame where the data position that was a "funny value" is True, and the data position that was normally captured as a numerical value is False.
    df_chk = df_dest.isna ()
    print (df_chk)
    # Display the result applied to the original data frame.
    print (df_src [df_chk])

    From the above
    Cells that were not successfully populated (ie, "strange values")
    df_src [df_chk]
    Can be referenced as non-NaN data.
    However__NA__Indicates cells that had no data in the original CSV.

  • Answer # 2

    How about converting to int type with astype () and determining that non-numeric data is included when an error occurs?