Home>

I'm sorry for the introductory question, but I would appreciate your help.
I'm using Pandas to aggregate downloaded CSV files like attached photos.
In the place colored in yellow, there is a notation of "number + X" in parentheses after the number, so the average or total group by method cannot be used for each name.
When I converted it to Nan with the replace function and handled it, the numerical value I wanted to aggregate before the parentheses was also converted to Nan, so I asked a question without thinking of a good method with a conditional statement that ignores the value in parentheses.

I would appreciate it if you could teach me.

  • Answer # 1

    It is better to apply a function that removes the "(~)" part with apply instead of replace. If you convert to int at the same time, the processing can be done at once.

    For me, use re.sub () to delete it.

  • Answer # 2

    How about modifying the string of data in the dataframe with apply?

    df ["Year H1cm total"] = df ["Year H1cm total"]. Apply (lambda x: re.sub (r "\ (. + \)", "", X))

    You should now be able to remove the "(~)" part.
    Also, as it is, it is still a character string, so if you want to convert it to a numerical value at the same time,

    df ["Year H1cm total"] = df ["Year H1cm total"]. Apply (lambda x: int (re.sub (r "\ (. + \)", "", X)))

    I think you should add an int like this.

    Also, when you look at the image, the parentheses appear to be full-width, so in that case you must also specify the parentheses in the corresponding expression.