@@ -396,6 +396,58 @@ documentation. If you build an extension array, publicize it on our
396
396
397
397
.. _cyberpandas: https://cyberpandas.readthedocs.io/en/latest/
398
398
399
+ .. _whatsnew_0230.enhancements.categorical_grouping:
400
+
401
+ Categorical Groupers has gained an observed keyword
402
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
403
+
404
+ In previous versions, grouping by 1 or more categorical columns would result in an index that was the cartesian product of all of the categories for
405
+ each grouper, not just the observed values.``.groupby()`` has gained the ``observed`` keyword to toggle this behavior. The default remains backward
406
+ compatible (generate a cartesian product). (:issue:`14942`, :issue:`8138`, :issue:`15217`, :issue:`17594`, :issue:`8669`, :issue:`20583`)
407
+
408
+
409
+ .. ipython:: python
410
+
411
+ cat1 = pd.Categorical(["a", "a", "b", "b"],
412
+ categories=["a", "b", "z"], ordered=True)
413
+ cat2 = pd.Categorical(["c", "d", "c", "d"],
414
+ categories=["c", "d", "y"], ordered=True)
415
+ df = pd.DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
416
+ df['C'] = ['foo', 'bar'] * 2
417
+ df
418
+
419
+ To show all values, the previous behavior:
420
+
421
+ .. ipython:: python
422
+
423
+ df.groupby(['A', 'B', 'C'], observed=False).count()
424
+
425
+
426
+ To show only observed values:
427
+
428
+ .. ipython:: python
429
+
430
+ df.groupby(['A', 'B', 'C'], observed=True).count()
431
+
432
+ For pivotting operations, this behavior is *already* controlled by the ``dropna`` keyword:
433
+
434
+ .. ipython:: python
435
+
436
+ cat1 = pd.Categorical(["a", "a", "b", "b"],
437
+ categories=["a", "b", "z"], ordered=True)
438
+ cat2 = pd.Categorical(["c", "d", "c", "d"],
439
+ categories=["c", "d", "y"], ordered=True)
440
+ df = DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
441
+ df
442
+
443
+ .. ipython:: python
444
+
445
+ pd.pivot_table(df, values='values', index=['A', 'B'],
446
+ dropna=True)
447
+ pd.pivot_table(df, values='values', index=['A', 'B'],
448
+ dropna=False)
449
+
450
+
399
451
.. _whatsnew_0230.enhancements.other:
400
452
401
453
Other Enhancements
@@ -527,68 +579,6 @@ If you wish to retain the old behavior while using Python >= 3.6, you can use
527
579
'Taxes': -200,
528
580
'Net result': 300}).sort_index()
529
581
530
- .. _whatsnew_0230.api_breaking.categorical_grouping:
531
-
532
- Categorical Groupers will now require passing the observed keyword
533
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
534
-
535
- In previous versions, grouping by 1 or more categorical columns would result in an index that was the cartesian product of all of the categories for
536
- each grouper, not just the observed values.``.groupby()`` has gained the ``observed`` keyword to toggle this behavior. The default remains backward
537
- compatible (generate a cartesian product). Pandas will show a ``FutureWarning`` if the ``observed`` keyword is not passed; the default will
538
- change to ``observed=True`` in the future. (:issue:`14942`, :issue:`8138`, :issue:`15217`, :issue:`17594`, :issue:`8669`, :issue:`20583`)
539
-
540
-
541
- .. ipython:: python
542
-
543
- cat1 = pd.Categorical(["a", "a", "b", "b"],
544
- categories=["a", "b", "z"], ordered=True)
545
- cat2 = pd.Categorical(["c", "d", "c", "d"],
546
- categories=["c", "d", "y"], ordered=True)
547
- df = pd.DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
548
- df['C'] = ['foo', 'bar'] * 2
549
- df
550
-
551
- ``observed`` must now be passed when grouping by categoricals, or a
552
- ``FutureWarning`` will show:
553
-
554
- .. ipython:: python
555
- :okwarning:
556
-
557
- df.groupby(['A', 'B', 'C']).count()
558
-
559
-
560
- To suppress the warning, with previous Behavior (show all values):
561
-
562
- .. ipython:: python
563
-
564
- df.groupby(['A', 'B', 'C'], observed=False).count()
565
-
566
-
567
- Future Behavior (show only observed values):
568
-
569
- .. ipython:: python
570
-
571
- df.groupby(['A', 'B', 'C'], observed=True).count()
572
-
573
- For pivotting operations, this behavior is *already* controlled by the ``dropna`` keyword:
574
-
575
- .. ipython:: python
576
-
577
- cat1 = pd.Categorical(["a", "a", "b", "b"],
578
- categories=["a", "b", "z"], ordered=True)
579
- cat2 = pd.Categorical(["c", "d", "c", "d"],
580
- categories=["c", "d", "y"], ordered=True)
581
- df = DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
582
- df
583
-
584
- .. ipython:: python
585
-
586
- pd.pivot_table(df, values='values', index=['A', 'B'],
587
- dropna=True)
588
- pd.pivot_table(df, values='values', index=['A', 'B'],
589
- dropna=False)
590
-
591
-
592
582
.. _whatsnew_0230.api_breaking.deprecate_panel:
593
583
594
584
Deprecate Panel
0 commit comments