Skip to content

Commit fc39e64

Browse files
committed
make observed=False the default, remove deprecation warning
1 parent fa42a84 commit fc39e64

File tree

4 files changed

+62
-99
lines changed

4 files changed

+62
-99
lines changed

doc/source/groupby.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -996,19 +996,19 @@ Handling of (un)observed Categorical values
996996

997997
When using a ``Categorical`` grouper (as a single or as part of multipler groupers), the ``observed`` keyword
998998
controls whether to return a cartesian product of all possible groupers values (``observed=False``) or only those
999-
that are observed groupers (``observed=True``). The ``observed`` keyword will default to ``True`` in the future.
999+
that are observed groupers (``observed=True``).
10001000

1001-
Show only the observed values:
1001+
Show all values:
10021002

10031003
.. ipython:: python
10041004
1005-
pd.Series([1, 1, 1]).groupby(pd.Categorical(['a', 'a', 'a'], categories=['a', 'b']), observed=True).count()
1005+
pd.Series([1, 1, 1]).groupby(pd.Categorical(['a', 'a', 'a'], categories=['a', 'b']), observed=False).count()
10061006
1007-
Show all values:
1007+
Show only the observed values:
10081008

10091009
.. ipython:: python
10101010
1011-
pd.Series([1, 1, 1]).groupby(pd.Categorical(['a', 'a', 'a'], categories=['a', 'b']), observed=False).count()
1011+
pd.Series([1, 1, 1]).groupby(pd.Categorical(['a', 'a', 'a'], categories=['a', 'b']), observed=True).count()
10121012
10131013
The returned dtype of the grouped will *always* include *all* of the catergories that were grouped.
10141014

doc/source/whatsnew/v0.23.0.txt

+52-62
Original file line numberDiff line numberDiff line change
@@ -396,6 +396,58 @@ documentation. If you build an extension array, publicize it on our
396396

397397
.. _cyberpandas: https://cyberpandas.readthedocs.io/en/latest/
398398

399+
.. _whatsnew_0230.enhancements.categorical_grouping:
400+
401+
Categorical Groupers has gained an observed keyword
402+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
403+
404+
In previous versions, grouping by 1 or more categorical columns would result in an index that was the cartesian product of all of the categories for
405+
each grouper, not just the observed values.``.groupby()`` has gained the ``observed`` keyword to toggle this behavior. The default remains backward
406+
compatible (generate a cartesian product). (:issue:`14942`, :issue:`8138`, :issue:`15217`, :issue:`17594`, :issue:`8669`, :issue:`20583`)
407+
408+
409+
.. ipython:: python
410+
411+
cat1 = pd.Categorical(["a", "a", "b", "b"],
412+
categories=["a", "b", "z"], ordered=True)
413+
cat2 = pd.Categorical(["c", "d", "c", "d"],
414+
categories=["c", "d", "y"], ordered=True)
415+
df = pd.DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
416+
df['C'] = ['foo', 'bar'] * 2
417+
df
418+
419+
To show all values, the previous behavior:
420+
421+
.. ipython:: python
422+
423+
df.groupby(['A', 'B', 'C'], observed=False).count()
424+
425+
426+
To show only observed values:
427+
428+
.. ipython:: python
429+
430+
df.groupby(['A', 'B', 'C'], observed=True).count()
431+
432+
For pivotting operations, this behavior is *already* controlled by the ``dropna`` keyword:
433+
434+
.. ipython:: python
435+
436+
cat1 = pd.Categorical(["a", "a", "b", "b"],
437+
categories=["a", "b", "z"], ordered=True)
438+
cat2 = pd.Categorical(["c", "d", "c", "d"],
439+
categories=["c", "d", "y"], ordered=True)
440+
df = DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
441+
df
442+
443+
.. ipython:: python
444+
445+
pd.pivot_table(df, values='values', index=['A', 'B'],
446+
dropna=True)
447+
pd.pivot_table(df, values='values', index=['A', 'B'],
448+
dropna=False)
449+
450+
399451
.. _whatsnew_0230.enhancements.other:
400452

401453
Other Enhancements
@@ -527,68 +579,6 @@ If you wish to retain the old behavior while using Python >= 3.6, you can use
527579
'Taxes': -200,
528580
'Net result': 300}).sort_index()
529581

530-
.. _whatsnew_0230.api_breaking.categorical_grouping:
531-
532-
Categorical Groupers will now require passing the observed keyword
533-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
534-
535-
In previous versions, grouping by 1 or more categorical columns would result in an index that was the cartesian product of all of the categories for
536-
each grouper, not just the observed values.``.groupby()`` has gained the ``observed`` keyword to toggle this behavior. The default remains backward
537-
compatible (generate a cartesian product). Pandas will show a ``FutureWarning`` if the ``observed`` keyword is not passed; the default will
538-
change to ``observed=True`` in the future. (:issue:`14942`, :issue:`8138`, :issue:`15217`, :issue:`17594`, :issue:`8669`, :issue:`20583`)
539-
540-
541-
.. ipython:: python
542-
543-
cat1 = pd.Categorical(["a", "a", "b", "b"],
544-
categories=["a", "b", "z"], ordered=True)
545-
cat2 = pd.Categorical(["c", "d", "c", "d"],
546-
categories=["c", "d", "y"], ordered=True)
547-
df = pd.DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
548-
df['C'] = ['foo', 'bar'] * 2
549-
df
550-
551-
``observed`` must now be passed when grouping by categoricals, or a
552-
``FutureWarning`` will show:
553-
554-
.. ipython:: python
555-
:okwarning:
556-
557-
df.groupby(['A', 'B', 'C']).count()
558-
559-
560-
To suppress the warning, with previous Behavior (show all values):
561-
562-
.. ipython:: python
563-
564-
df.groupby(['A', 'B', 'C'], observed=False).count()
565-
566-
567-
Future Behavior (show only observed values):
568-
569-
.. ipython:: python
570-
571-
df.groupby(['A', 'B', 'C'], observed=True).count()
572-
573-
For pivotting operations, this behavior is *already* controlled by the ``dropna`` keyword:
574-
575-
.. ipython:: python
576-
577-
cat1 = pd.Categorical(["a", "a", "b", "b"],
578-
categories=["a", "b", "z"], ordered=True)
579-
cat2 = pd.Categorical(["c", "d", "c", "d"],
580-
categories=["c", "d", "y"], ordered=True)
581-
df = DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
582-
df
583-
584-
.. ipython:: python
585-
586-
pd.pivot_table(df, values='values', index=['A', 'B'],
587-
dropna=True)
588-
pd.pivot_table(df, values='values', index=['A', 'B'],
589-
dropna=False)
590-
591-
592582
.. _whatsnew_0230.api_breaking.deprecate_panel:
593583

594584
Deprecate Panel

pandas/core/groupby/groupby.py

+5-15
Original file line numberDiff line numberDiff line change
@@ -557,7 +557,7 @@ class _GroupBy(PandasObject, SelectionMixin):
557557
def __init__(self, obj, keys=None, axis=0, level=None,
558558
grouper=None, exclusions=None, selection=None, as_index=True,
559559
sort=True, group_keys=True, squeeze=False,
560-
observed=None, **kwargs):
560+
observed=False, **kwargs):
561561

562562
self._selection = selection
563563

@@ -2890,7 +2890,8 @@ class Grouping(object):
28902890
obj :
28912891
name :
28922892
level :
2893-
observed : If we are a Categorical, use the observed values
2893+
observed : boolean, default False
2894+
If we are a Categorical, use the observed values
28942895
in_axis : if the Grouping is a column in self.obj and hence among
28952896
Groupby.exclusions list
28962897
@@ -2906,7 +2907,7 @@ class Grouping(object):
29062907
"""
29072908

29082909
def __init__(self, index, grouper=None, obj=None, name=None, level=None,
2909-
sort=True, observed=None, in_axis=False):
2910+
sort=True, observed=False, in_axis=False):
29102911

29112912
self.name = name
29122913
self.level = level
@@ -2963,17 +2964,6 @@ def __init__(self, index, grouper=None, obj=None, name=None, level=None,
29632964
# a passed Categorical
29642965
elif is_categorical_dtype(self.grouper):
29652966

2966-
# Use the observed values of the grouper if inidcated
2967-
observed = self.observed
2968-
if observed is None:
2969-
msg = ("pass observed=True to ensure that a "
2970-
"categorical grouper only returns the "
2971-
"observed categories, or\n"
2972-
"observed=False to also include"
2973-
"unobserved categories.\n")
2974-
warnings.warn(msg, FutureWarning, stacklevel=5)
2975-
observed = False
2976-
29772967
self.all_grouper = self.grouper
29782968
self.grouper = self.grouper._codes_for_groupby(
29792969
self.sort, observed)
@@ -3092,7 +3082,7 @@ def groups(self):
30923082

30933083

30943084
def _get_grouper(obj, key=None, axis=0, level=None, sort=True,
3095-
observed=None, mutated=False, validate=True):
3085+
observed=False, mutated=False, validate=True):
30963086
"""
30973087
create and return a BaseGrouper, which is an internal
30983088
mapping of how to create the grouper indexers.

pandas/tests/groupby/test_categorical.py

-17
Original file line numberDiff line numberDiff line change
@@ -246,23 +246,6 @@ def test_apply(ordered):
246246
assert_series_equal(result, expected)
247247

248248

249-
def test_observed_warning():
250-
# 20583 - future warning on observe
251-
252-
cat1 = Categorical(["a", "a", "b", "b"],
253-
categories=["a", "b", "z"], ordered=True)
254-
cat2 = Categorical(["c", "d", "c", "d"],
255-
categories=["c", "d", "y"], ordered=True)
256-
df = DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
257-
df['C'] = ['foo', 'bar'] * 2
258-
259-
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
260-
df.groupby(['A', 'B', 'C'])
261-
262-
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
263-
df.groupby('A')
264-
265-
266249
def test_observed(observed):
267250
# multiple groupers, don't re-expand the output space
268251
# of the grouper

0 commit comments

Comments
 (0)