Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: MultiIndex slicing #46040

Merged
merged 10 commits into from
Feb 27, 2022
Merged

PERF: MultiIndex slicing #46040

merged 10 commits into from
Feb 27, 2022

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Feb 17, 2022

Similar to #45931, but for MultiIndex slicing.

import numpy as np
import pandas as pd

lev0 = pd.date_range("2000-01-01", "2020-12-31", freq="D")
lev1 = np.arange(10000)
mi = pd.MultiIndex.from_product([lev0, lev1])

df = pd.DataFrame({"A": 1.0}, index=mi)

%timeit df.loc["2010-12-31": "2015-12-31"]

132 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)     <- main
571 µs ± 4.25 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)  <- PR

@lukemanley lukemanley added MultiIndex Performance Memory or execution speed performance Indexing Related to indexing on series/frames, not to indexes themselves labels Feb 17, 2022
@lukemanley lukemanley mentioned this pull request Feb 19, 2022
4 tasks
@lukemanley lukemanley added this to the 1.5 milestone Feb 20, 2022
@lukemanley
Copy link
Member Author

I just expanded the asvs for MultiIndex indexing as part of this. There are a lot of code paths for multi indexing and the existing asvs only covered a few of them. A few of the added asvs are currently a bit above the targeted 80ms, but I have a few follow on PRs that will hopefully put them all well below 80ms.

@jreback jreback merged commit 6466fc6 into pandas-dev:main Feb 27, 2022
@jreback
Copy link
Contributor

jreback commented Feb 27, 2022

very nice @lukemanley keem em coming!

@lukemanley lukemanley deleted the multi-slice-perf branch March 2, 2022 01:13
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants