Home>

In python3, there is a daily file in the folder with data in DataFrame.
Leave the top data with the same duplicate date and time data, I want to delete other duplicate data at the same time as deleting it from the original data.
After removing it, add it every 5 seconds to create data.



DATE_TIME Number of people
2016/8/8 0:00:00 100
2016/8/8 0:00:01 232
2016/8/8 0:00:02 336
2016/8/8 0:00:03 335
2016/8/8 0:00:03 132 ← I want to extract/delete this line.
2016/8/8 0:00:04 453
2016/8/8 0:00:05 223
:::

DATE_TIME Number of people
2016/8/8 0:00:00 100
2016/8/8 0:00:01 232
2016/8/8 0:00:02 336
2016/8/8 0:00:03 335
2016/8/8 0:00:04 453
2016/8/8 0:00:05 223
:::


Code I tried
import glob
import pandas as pd
import numpy as np
path ='C:/filehokan /'
file = 356
# Get csv file in folder
csv_files = glob.glob (path +'* .csv', recursive = True)
for i in range (file):
    df = pd.read_csv (csv_files [i], encoding ='cp932', engine ='python')
    df ['DATE_TIME'] = pd.to_datetime (df ['DATE_TIME'], format ='% Y-% m-% d% H:% M:% S')
    df.set_index ('DATE_TIME', inplace = True)
******** I don't know how to extract/delete. I don't know if I use dropna.
   df1 = df.resample ('5s'). Sum ← Remove duplicates, add every 5 seconds and save a new one
   df1.to_csv ('C: /filehokan/newfile_df.csv', index = True, encoding ='shift_jis')
i + = 1