Home>

I want to write an iterative process for substituting the average value of the three values ​​of the y column in the z column in order for the DataFrame as shown below.
I would appreciate your guidance.

Currently available Dataframe

x y z
1 0.1 0.1
2 -0.1 0.1
3 0.2 0.2
4 -0.2 0.2
5 0.4 0.4
6 -0.4 0.4

Dataframe I want to find

x y z
1 0.1 0.1 0.2 (average of y1 to y3)
2 -0.1 0.1 0.2 (average of y1 to y3)
3 0.4 0.4 0.2 (average of y1 to y3)
4 -0.2 0.2 0.3 (average of y4 to y6)
5 0.4 0.4 0.3 (average of y4 to y6)
6 -0.3 0.3 0.3 (average of y4 to y6)

The data of x and y is substituted as follows.

import pandas as pd
import numpy as np
raw = pd.read_csv ('Users/~~~~~/raw.csv', header = 0)
ds = pd.DataFrame (index = [],)
ds ['x'] = raw ['x']
ds ['y'] = np.fabs (ds ['x'])
For

z, I tried to write the following code to get three values ​​at a time, but this is a situation where only y2 and y3 data can be obtained.

for i in range (3):
    df = ds ['y'] [i: i * 2]

Thanks for your guidance.

  • Answer # 1

    Please useSeries.groupby (). transform ()instead of using a for statement

    df ['z'] = df.groupby (df.index // 3) ['y']. transform ('mean')

    Append

      

    For 20 data, divide into the first 10 and the latter 10

    If you do it normally, would you divide the first half and the second half of the data into two times (in a loop) and then add the 'z' column by combining the results?
    However, if grouping with the value ofindex // 3as it is, the latter half of the group will be broken, so the value from 0 to 9 is simply divided by 3 ( (Only integer part).

    d = []
    for i in [0,10]:
        d.append (df [i: i + 10] .groupby (np.arange (10) // 3) ['y']. transform ('mean'))
    df ['z'] = pd.concat (d)


    If groupby is doubled, you can write with one liner, but readability seems to be

    df ['z'] = df.groupby (df.index // 10, group_keys = False) .apply (lambda d: d.groupby (np.arange (10 ) // 3) ['y']. Transform ('mean'))

  • Answer # 2

    The average df of the three values ​​is

    for i in range (0, 6, 3):
        df = sum (ds [i: i + 3])/3
    Calculate with

    . Then, when substituting for z

    for i in range (0, 6, 3):
        df = sum (ds [i: i + 3])/3
        ds ["z"] [i: i + 3] = np.tile (df, 3)

Related articles