Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: do not crash on a callable grouper returning tuples #22279

Closed
wants to merge 1 commit into from

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Aug 11, 2018

Notice I didn't have time to test which commit caused the regression.

@pep8speaks
Copy link

pep8speaks commented Aug 11, 2018

Hello @toobaz! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 12, 2018 at 08:48 Hours UTC

@toobaz toobaz requested a review from WillAyd August 11, 2018 09:45
@toobaz toobaz force-pushed the gb_func_tuples_22257 branch from ba3fc4c to 69e696f Compare August 11, 2018 09:46
@codecov
Copy link

codecov bot commented Aug 11, 2018

Codecov Report

Merging #22279 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22279      +/-   ##
==========================================
- Coverage   92.08%   92.07%   -0.01%     
==========================================
  Files         169      169              
  Lines       50694    50699       +5     
==========================================
+ Hits        46682    46683       +1     
- Misses       4012     4016       +4
Flag Coverage Δ
#multiple 90.48% <100%> (-0.01%) ⬇️
#single 42.34% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/groupby/grouper.py 97.48% <100%> (-0.69%) ⬇️
pandas/core/indexes/multi.py 95.08% <0%> (-0.26%) ⬇️
pandas/core/internals/managers.py 96.58% <0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d09db1f...9123f7d. Read the comment docs.

gb = df.groupby(by=func, sort=False)

name, expected = list(gb)[0]
result = gb.get_group(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more explicit about what should be returned here? I think retrieving the element from the GroupBy object and subsequently passing that to get_group could potentially hide bugs in the future

@@ -508,6 +508,15 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True,
warnings.warn(msg, FutureWarning, stacklevel=5)
key = list(key)

if callable(key) or isinstance(key, dict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the isinstance(key, dict) new here? If so probably want a test case to cover

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done - expanded the test)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you integrate this with the below (e.g. i/elif maybe enough)?
can add a top-levevel comment on what is going on here

@@ -346,7 +360,7 @@ def test_groupby_grouper_f_sanity_checked(self):
# when the elements are Timestamp.
# the result is Index[0:6], very confusing.

pytest.raises(AssertionError, ts.groupby, lambda key: key[0:6])
pytest.raises(ValueError, ts.groupby, lambda key: key[0:6])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The single callable/mapping case used to follow the same code path as lists of callables/mappings, while it now follows the same code path as e.g. lists of Series. The two happen to check the length of the grouper in different ways (the former should probably be changed to ValueError too, but we might decide to remove it altogether in #22278).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice this test will also change when we remove the absurd DatetimeIndex.map implementation (see #22312 ) but that is presumably not happening in a bugfix release.

@toobaz toobaz force-pushed the gb_func_tuples_22257 branch from 69e696f to 9123f7d Compare August 12, 2018 08:48
@gfyoung gfyoung added Groupby Regression Functionality that used to work in a prior pandas version MultiIndex labels Aug 13, 2018
@@ -20,10 +20,11 @@ and bug fixes. We recommend that all users upgrade to this version.
Fixed Regressions
~~~~~~~~~~~~~~~~~

- Constructing a DataFrame with an index argument that wasn't already an
- Constructing a :class:`DataFrame` with an index argument that wasn't already an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let' move to 0.24

@@ -508,6 +508,15 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True,
warnings.warn(msg, FutureWarning, stacklevel=5)
key = list(key)

if callable(key) or isinstance(key, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you integrate this with the below (e.g. i/elif maybe enough)?
can add a top-levevel comment on what is going on here

@jreback
Copy link
Contributor

jreback commented Nov 23, 2018

@toobaz can you merge master and move the doc-note to 0.24.0

@jreback
Copy link
Contributor

jreback commented Dec 14, 2018

closing as stale, but pls reopen if can continue

@jreback jreback closed this Dec 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby MultiIndex Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get_group(...) fails for groupby(...) based on a function
5 participants