I faced such a problem. In the picture, there is a new_column in which a one appears when a sequence breaks in the period_id column. I need to create another column in which the number of periods that are in the range from one unit to another will be sequentially numbered. Those. 0,1,2,3 ... 15, then the numbering sees one and starts over (0,1), etc. The question is, is it possible to register this task using the cumsum function? I tried summing only zeros, but I don't understand how to make sure that when a new zero appears after one, the numbering starts over.

df ['final']= df ['new_column']. eq (0) .cumsum ()

Please provide in the question a small but reproducible example of input data (in the form of text /CSV /Python code or a link to a file) and what you expect to receive in the output. I also advise you to read: How to most effectively ask a question related to data processing and /or analysis (for example: by Pandas /Numpy /SciPy /SciKit Learn /SQL)

MaxU2021-02-23 18:29:32
  • Answer # 1

    This task can be solved using the vectorized (i.e., without resorting to loops) Pandas tools.


    n [229]: df
    Out [229]:
    0 1
    3 1
    4 0
    5 1
    6 1
    In [230]: df ["res"]= \
                  df.groupby (df ["new_column"]. eq (1) .cumsum ()) ["new_column"]. cumcount ()
    In [231]: df
    Out [231]:
       new_column res
    0 1 0
    1 0 1
    2 0 2
    3 1 0
    4 0 1
    5 1 0
    6 1 0

    Thank you very much, very cool solution!)

    Marina Bazyleva2021-02-23 18:29:32

    Well, groupby here, of course, suggested itself, but I did not guess further)

    CrazyElf2021-02-23 18:29:32

    @CrazyElf, EMNIP it only exists for GropBy objects

    MaxU2021-02-23 18:29:32

    Elegantly! I broke my whole head, did not know what cumcount is)

    CrazyElf2021-02-23 18:29:32