We would appreciate it if you could give us an idea of how to search for unexpected data in the data frame.
For example, if there is a small number of string type data in a column that would originally be composed of numeric type,
How can you extract that data?
df.dtypesCheck the data type of each column from, and it is not int type but object type,
I checked the original data directly and found unnatural data.
I would like to see the corresponding lines and data in the code.
Excuse me, but I would like to ask you to teach. Thanks for your cooperation.
Answer # 1
The original CSV assumes that the first row is the header column.
The following assumes that all data formats to be changed are numbers.
import pandas as pd df_org = pd.read_csv ('data1.csv', header = 0) # Of the read csv, replace the missing value data with a character string on purpose. If you don't want to include missing values in "strange values", you can just set this line to df_src = df_org. df_src = df_org.fillna ('__ NA__') #Convert data to numbers. cols = df_src.columns df_dest = df_src [cols] .apply (pd.to_numeric, errors ='coerce') # Create a data frame where the data position that was a "funny value" is True, and the data position that was normally captured as a numerical value is False. df_chk = df_dest.isna () print (df_chk) # Display the result applied to the original data frame. print (df_src [df_chk])
From the above
Cells that were not successfully populated (ie, "strange values")
Can be referenced as non-NaN data.
__NA__Indicates cells that had no data in the original CSV.
Answer # 2
How about converting to int type with astype () and determining that non-numeric data is included when an error occurs?
- python - i can't understand the specifications of pandas
- python 3x - the reading of the expression in the excel file in pandas becomes nan
- python - pandas attributeerror: module'pandas' has no attribute'read_tabel' error
- python - shuffle a few lines of pandas for weekdays and holidays
- eliminating pandas install and import in python pyenv export ldflags
- python - merge after pandas pivot
- get gogole search screen automatically with python
- python - i want to get the publisher name from the book name with the national diet library search api, but it does not hit
- python - i made a code to search a website tag as a string, but can it be simplified?
- python - i want to search only a part of html (a certain selector) with find_all of beautifulsoup
- python - about data analysis in pandas
- python - count by element with pivot_table in pandas
- python - how to load multiple time formats with pandas
- python pandas pivot_table is not reflected
- python - about pandas, gspread
- how do i merge dataframes inside a python pandas function?
- i don't understand the meaning of x used in a python pandas lambda expression
- python - the date format when reading csv data with pandas and graphing it does not work
- [python] how to search for data close to a query from 2d distribution data
- python : Pandas: Merge Several DataFrame using several (2) common columns
- python : I can not add data frames to the list
- How to find a coincidence from Excel between 2 columns in Python
- python : Saving changes to the CSV file
- python : How to combine 4 DataFrame in one?
- python : Search the amount in a sliding window from a local maximum in another column to the end of the window
- python : How to create a second one on a column that will reflect the content of the first?
- python : How to identify the frequency of words in the filtered DataFrame and perform their sorting in frequency?
- python : Combining multiple Excel files to one
- python : Delete strings by condition in DataFrame using the Drop method