Home>

"Data type_serial number.csv"
There is a file with a file name of the format. It is in the list.

Basically put this in another list.
However, since there are multiple data types of the same type,
In that case, I want to get only the one with the larger serial number.

# original data
files = [
    "A0_01.csv", #<--- There are multiple data types A0, and the serial number value is low, so it is not necessary.
    "A1_01.csv", #<---There is only one data type A1
    "A0_02.csv", #<---There are multiple data types A0, and the serial number value is high.
    "A2_03.csv", #<---There is only one data type A2
    "A15_04.csv", #<---There is only one data type A15.
    "A7_05.csv" #<---There is only one data type A7.
]
# Data I want to extract from the original data
files = [
    "A1_01.csv",
    "A0_02.csv",
    "A2_03.csv",
    "A15_04.csv",
    "A7_05.csv"
]

Therefore, put the data type and serial number in the two-dimensional array so that you can compare the latest data.
After the comparison, I thought about trying to get rid of the unwanted ones by comparing it with the list of filenames.
(Please tell me if there is another efficient method.)

That's why I made the source code as below,
----- The element extraction process of the list ----- does not work.
(It may be strange from the way of thinking)

def func (files):
    li = []
    for file in files:
        tmp = file.split ("_")
        datakind = tmp [0]
        commaindex = tmp [1] .index (".")
        num = tmp [1] [: commaindex]
        li.append ([datakind, num])
    # print (li)
    # [['A0', '01'], ['A1', '01'], ['A0', '02'], ['A2', '03'], ['A15', '04' ]]]
    # ----- List element extraction process -----
    #Image: Compare li [i] [0] and li [j] [0] and if they match, the latest one of li [i] [1] and li [j] [1] is the list for return value It feels like putting it in
    # This fails
    #for i in range (len (li) -1):
    # for j in range (i + 1, len (li) -1):
    #abridgement

    # ----- Extract only the necessary ones from the files -----
    #return

#Files
#There is no regularity in the data type
# In the case of the example below, there are multiple A0s, so I want to take A0_02.csv, which has a large serial number.
files = [
    "A0_01.csv",
    "A1_01.csv",
    "A0_02.csv",
    "A2_03.csv",
    "A15_04.csv",
]

ret = func (files)
  • Answer # 1

    If you don't mind changing the order of the original list, you should sort and then extract only the rows with the maximum serial number.

    import pandas as pd
    files = [
        "A0_01.csv", #<--- There are multiple data types A0, and the serial number value is low, so it is not necessary.
        "A1_01.csv", #<---There is only one data type A1
        "A0_02.csv", #<---There are multiple data types A0, and the serial number value is high.
        "A2_03.csv", #<---There is only one data type A2
        "A15_04.csv", #<---There is only one data type A15.
        "A7_05.csv" #<---There is only one data type A7.
    ]
    df = pd.DataFrame ({'file': sorted (files)})
    # Deleted rows with duplicate data types. What to leave is the end
    df ['type'] = df ['file'] .apply (lambda v: v [: v.find ('_')])
    df = df.drop_duplicates (subset ='type', keep ='last')
    ret = df ['file']. tolist ()
    print (ret) #'A0_02.csv','A15_04.csv','A1_01.csv','A2_03.csv','A7_05.csv']