-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame.groupby fails with MultiIndex containing pd.NaT #9236
Comments
I can reproduce this on master. Thanks for the report! |
iirc this is a dupe issue - if someone would like 2 find the reference |
actually, will reopen in case it is slightly different. |
Thanks for looking into this! I've been banging into this all day as I've been working on some analysis. I took a look at the pandas source, but it's not clear to me where the bug is and how to go about fixing it. Nonetheless, I've found a pretty quick workaround that produces the behavior I would expect. Maybe this will help others with a similar problem or give some direction in fixing the issue. Essentially, the workaround drops the Here's an example of the workaround that works for me: midx = pd.MultiIndex(levels=[[pd.NaT, pd.datetime(2012,1,2),
pd.datetime(2012,1,3)], ['a', 'b']],
labels=[[0, 1, 1, 2], [0, 0, 1, 0]], names=['date', None])
df = pd.Series(pd.np.random.rand(4), index=midx)
df.groupby(df.index.get_level_values(0)).count() |
Looks to be fixed on master. I imagine this edge case could use a test.
|
@mroeschke |
|
It seems that the
groupby
operation fails when the row index is a MultiIndex containing NaT values. For example, the following code fails (v0.15.2) withTypeError: 'numpy.ndarray' object is not callable
:However, it seems as though np.nan values are handled properly:
The text was updated successfully, but these errors were encountered: