Home>

I want to read Excel data with Python and output duplicate rows to CSV by dividing them into columns.

Read Excel data (7 columns of data has about 50,000 rows) and read
If the data in a particular column is duplicated, I want to extract that row and write it to CSV or Excel.
Excel data has 7 columns and extracts the data in the second row.
The one extracted using duplicated was forcibly output to CSV.
* 20201121 19:10 correction
Before modification: When extracted using duplicated, it was omitted as "...", so the number of displays was increased appropriately and the visualized version was forcibly output to CSV.

The problem i am having

Each row got into one column and had to be processed again.
This time, there were about 30 duplicates, so it was quite a lot by hand, but if the number of duplicates increases, it is difficult to do by hand.
So, please tell me how to divide the extracted rows into columns and save them in a CSV or Excel file in a form that is easy to reuse.

What I tried

Using the following, I increased the number of displays and forcibly output what was visualized to CSV.
pd.set_option ('display.max_rows', 500000)
pd.set_option ('display.max_columns', 500000)

Corresponding source code
print (pd.get_option ("display.max_rows"))
print (pd.get_option ("display.max_columns"))
 #Maximum display number to be newly set
pd.set_option ('display.max_rows', 500000)
pd.set_option ('display.max_columns', 500000)
###########################
# Read EXCEL
df = pd.read_excel ('filename')
##################
#Check Dup
df = df [df.duplicated (subset ='target column name')]
print (df)
##########
Output to #csv
with open ('file.csv','w') as f:
  print (df, file = f)
  • Answer # 1

    Do not write the screen output of df to a file.

    As the questioner was worried, some parts may be omitted with "...", and since they are not separated by commas, they cannot be read as CSV format. All of them are stored in one column because they are separated by multiple spaces when the screen is output.

    For output to CSV file, pandasto_csvPlease use.

    -with open ('file.csv','w') as f:
    --print (df, file = f)
    + df.to_csv ('file.csv', index = None)