Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: 1.3 behavior change with groupby and asindex=False #41998

Closed
aberres opened this issue Jun 14, 2021 · 7 comments · Fixed by #42254
Closed

REGR: 1.3 behavior change with groupby and asindex=False #41998

aberres opened this issue Jun 14, 2021 · 7 comments · Fixed by #42254
Labels
Groupby Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@aberres
Copy link
Contributor

aberres commented Jun 14, 2021

Code Sample, a copy-pastable example

pd.DataFrame(columns=["date", "crew_member_id"])
    .groupby(["date", "crew_member_id"], as_index=False)
    .sum()

Problem description

With 1.3rc1 this fails with ValueError: Length of values (0) does not match length of index (1) while 1.2 returns and empty frame.

Expected Output

If this is the expected behavior fine. Just wanted to ensure that it is not an accidental regression.

@aberres aberres added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 14, 2021
@aberres aberres changed the title BUG?: 1.3 Behavioral change with groupby and asindex=False BUG?: 1.3 Behavior change with groupby and asindex=False Jun 14, 2021
@aberres aberres changed the title BUG?: 1.3 Behavior change with groupby and asindex=False BUG?: 1.3 behavior change with groupby and asindex=False Jun 14, 2021
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Jun 14, 2021
@simonjayhawkins
Copy link
Member

Thanks @aberres for the report

With 1.3rc1 this fails with ValueError: Length of values (0) does not match length of index (1) while 1.2 returns and empty frame.

first bad commit: [48a5853] REF: preserve Index dtype in BlockManager._combine (#41354)

before this commit the result was

2021-06-14T12:25:54.6429723Z 1.3.0.dev0+1555.g3cce96f515
2021-06-14T12:25:54.6430581Z Empty DataFrame
2021-06-14T12:25:54.6430967Z Columns: [date, crew_member_id]
2021-06-14T12:25:54.6431332Z Index: []
2021-06-14T12:25:54.7091204Z Bisecting: 1 revision left to test after this (roughly 1 step)

which is consistent with the result from a non-empty DataFrame, unlike the result on 1.2.4 which was an empty DataFrame with no columns

cc @jbrockmendel

@phofl
Copy link
Member

phofl commented Jun 15, 2021

Yep my ci picked that up as well. Should we label this as a regression @simonjayhawkins ?

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Jun 15, 2021

sure. I left it in the needs triage state for another set of eyes as I only bisected the change of behavior. If we use a populated DataFrame in the code sample instead of an empty one. the result does not make much sense to me.

@phofl
Copy link
Member

phofl commented Jun 15, 2021

This one fails too:

df = pd.DataFrame(columns=["a", "b"]).groupby(["a"], as_index=False).sum()

The initial code snippet seems a bit artificial to me, mostly because I don't know what to actually expect. But this one returned

Empty DataFrame
Columns: [a, b]
Index: []

on 1.2.4 which looks sensible to me.
Labelling as a regression

@phofl phofl added Groupby Regression Functionality that used to work in a prior pandas version and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 15, 2021
@phofl phofl added this to the 1.3 milestone Jun 15, 2021
@jbrockmendel
Copy link
Member

ive got it on my todo list (i.e. an open browser tab) to look at this, but nothing obvious off the top of my head

@simonjayhawkins simonjayhawkins changed the title BUG?: 1.3 behavior change with groupby and asindex=False REGR: 1.3 behavior change with groupby and asindex=False Jun 25, 2021
@rhshadrach
Copy link
Member

The method _wrap_agged_manager uses mgr.shape[1] for the length of the index, it seems to me it should be using

mgr.shape[1] if not mgr.shape[0] == 0 else 0

This fixes the issue in the OP and gives the output

Empty DataFrame
Columns: [date, crew_member_id]
Index: []

which seems to me in-line with the result of

(
    pd.DataFrame([[1, 2], [3, 4]], columns=["date", "crew_member_id"])
    .groupby(["date", "crew_member_id"], as_index=False)
    .sum()
)

which gives

   date  crew_member_id
0     1               2
1     3               4

However, for the issue in #41998 (comment), it gives

Empty DataFrame
Columns: [a]
Index: []

which to me looks wrong - as @phofl said we should also be getting the column 'b'. I don't this the column information in the mgr passed to _wrap_agged_manager, so maybe the fix needs to happen more upstream.

@rhshadrach
Copy link
Member

Ah - this path is dropping column 'b' because it is object dtype. Doing

print(pd.DataFrame(columns=["a", "b"], dtype=int).groupby(["a"], as_index=False).sum())

gives

Empty DataFrame
Columns: [a, b]
Index: []

which is better. Running tests now, PR to follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants