make a slice in a multi-index using Timestamp Pandas (reindex the multi-index to use the slice by Timestamp)
I have a dataframe. Built multi-bank: (str, str, Timestamp).
MultiIndex([...], names=['IDКлиента, Бизнес-Линия', 'Расчет', 'Дата инвестиции'], length=73012)
I want to select the dataframe values from the date range, for example: 2020-01-01 : 2020-12-31 (for the year, including the left and right borders):
df.loc[slice(None),slice(None), slice('2020-01-01','2020-12-31')]
Catching the error: "IndexingError: Too many indexers"
I try it in a different way:
start = pd.Timestamp('2020-01-01')
finish = pd.Timestamp('2020-12-31')
df.loc[slice(None),slice(None), slice(start, finish)]
I still catch the error: "IndexingError: Too many indexers"
How to do it correctly slice?
UPD:
I saw that I was treated incorrectly, I corrected it:
df.loc[(slice(None),slice(None), slice('2020-01-01','2020-12-31')), :]
Catching another error:
UnsortedIndexError: 'MultiIndex slicing requires the index to be lexsorted: slicing on levels [2], lexsort depth 0'
2 answers
You have already answered your own question - in order to be able to use slices, the index must be sorted.
I would also recommend using pd.IndexSlice[...] for multi-index slicing:
idx = pd.IndexSlice
df.loc[idx[:, :, '2020-01-01':'2020-12-31'], :]
Instead of:
df.loc[(slice(None),slice(None), slice('2020-01-01','2020-12-31')), :]
It was necessary to sort the index after creation:
df = df.set_index(['IDКлиента, Бизнес-Линия', 'Расчет'])
df = df.set_index('Дата инвестиции', append=True, drop=False)
df.sort_index(ascending=True, inplace=True)