Home>

In a data frame like the one belowIndex is kept by dateI want to add a column of numbers (with 2019-11-01 as zero) to keep track of what number the row of that date is.

For example, when annotating, I think it is necessary to enter the line number in the argument, but I want to know what number the specific date is at that time. (Is it possible to pass the date directly to annotation in the first place?)

I would appreciate it if anyone could reply.

from pandas_datareader import data
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
% matplotlib inline
start = '2019-11-01'
end = '2020-11-01'
df = data.DataReader ('^ N225','yahoo', start, end)
df
  • Answer # 1

    df = df.reset_index (). reset_index () .set_index ('Date')

    Another solution

    df ['index'] = df.reset_index (). index

  • Answer # 2

    There are several ways.

    df.assign (index = range (len (df)))
    # Or
    df.assign (idx = np.arange (len (df)))
    # Or
    df.assign (idx = pd.RangeIndex (len (df)))

    Speed ​​measurement

    import benchit
    import numpy as np
    import pandas as pd
    def by_reset_reset (df):
        return df.reset_index (). reset_index () .set_index ('index')
    def by_reset (df):
        return df.assign (idx = df.reset_index (). index)
    def by_range (df):
        return df.assign (idx = range (len (df)))
    def by_np_arange (df):
        return df.assign (idx = np.arange (len (df)))
    def by_rangeindex (df):
        return df.assign (idx = pd.RangeIndex (len (df)))
    df = pd._testing.makeDataFrame ()
    t = benchit.timings ([by_reset_reset, by_reset, by_range, by_np_arange, by_rangeindex],
                        [df.sample (n, replace = True, random_state = 0)
                         for n in 10 ** np.arange (6)])
    t.plot (logx = True, logy = True, figsize = (10, 6))

Trends