@@ -91,10 +91,10 @@ The mapping can be specified many different ways:
91
91
- A Python function, to be called on each of the axis labels.
92
92
- A list or NumPy array of the same length as the selected axis.
93
93
- A dict or ``Series ``, providing a ``label -> group name `` mapping.
94
- - For ``DataFrame `` objects, a string indicating a column to be used to group.
94
+ - For ``DataFrame `` objects, a string indicating a column to be used to group.
95
95
Of course ``df.groupby('A') `` is just syntactic sugar for
96
96
``df.groupby(df['A']) ``, but it makes life simpler.
97
- - For ``DataFrame `` objects, a string indicating an index level to be used to
97
+ - For ``DataFrame `` objects, a string indicating an index level to be used to
98
98
group.
99
99
- A list of any of the above things.
100
100
@@ -120,7 +120,7 @@ consider the following ``DataFrame``:
120
120
' D' : np.random.randn(8 )})
121
121
df
122
122
123
- On a DataFrame, we obtain a GroupBy object by calling :meth: `~DataFrame.groupby `.
123
+ On a DataFrame, we obtain a GroupBy object by calling :meth: `~DataFrame.groupby `.
124
124
We could naturally group by either the ``A `` or ``B `` columns, or both:
125
125
126
126
.. ipython :: python
@@ -360,8 +360,8 @@ Index level names may be specified as keys directly to ``groupby``.
360
360
DataFrame column selection in GroupBy
361
361
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
362
362
363
- Once you have created the GroupBy object from a DataFrame, you might want to do
364
- something different for each of the columns. Thus, using ``[] `` similar to
363
+ Once you have created the GroupBy object from a DataFrame, you might want to do
364
+ something different for each of the columns. Thus, using ``[] `` similar to
365
365
getting a column from a DataFrame, you can do:
366
366
367
367
.. ipython :: python
@@ -421,7 +421,7 @@ statement if you wish: ``for (k1, k2), group in grouped:``.
421
421
Selecting a group
422
422
-----------------
423
423
424
- A single group can be selected using
424
+ A single group can be selected using
425
425
:meth: `~pandas.core.groupby.DataFrameGroupBy.get_group `:
426
426
427
427
.. ipython :: python
@@ -444,8 +444,8 @@ perform a computation on the grouped data. These operations are similar to the
444
444
:ref: `aggregating API <basics.aggregate >`, :ref: `window functions API <stats.aggregate >`,
445
445
and :ref: `resample API <timeseries.aggregate >`.
446
446
447
- An obvious one is aggregation via the
448
- :meth: `~pandas.core.groupby.DataFrameGroupBy.aggregate ` or equivalently
447
+ An obvious one is aggregation via the
448
+ :meth: `~pandas.core.groupby.DataFrameGroupBy.aggregate ` or equivalently
449
449
:meth: `~pandas.core.groupby.DataFrameGroupBy.agg ` method:
450
450
451
451
.. ipython :: python
@@ -517,12 +517,12 @@ Some common aggregating functions are tabulated below:
517
517
:meth: `~pd.core.groupby.DataFrameGroupBy.nth `;Take nth value, or a subset if n is a list
518
518
:meth: `~pd.core.groupby.DataFrameGroupBy.min `;Compute min of group values
519
519
:meth: `~pd.core.groupby.DataFrameGroupBy.max `;Compute max of group values
520
-
521
520
522
- The aggregating functions above will exclude NA values. Any function which
521
+
522
+ The aggregating functions above will exclude NA values. Any function which
523
523
reduces a :class: `Series ` to a scalar value is an aggregation function and will work,
524
524
a trivial example is ``df.groupby('A').agg(lambda ser: 1) ``. Note that
525
- :meth: `~pd.core.groupby.DataFrameGroupBy.nth ` can act as a reducer *or * a
525
+ :meth: `~pd.core.groupby.DataFrameGroupBy.nth ` can act as a reducer *or * a
526
526
filter, see :ref: `here <groupby.nth >`.
527
527
528
528
.. _groupby.aggregate.multifunc :
@@ -732,7 +732,7 @@ and that the transformed data contains no NAs.
732
732
.. note ::
733
733
734
734
Some functions will automatically transform the input when applied to a
735
- GroupBy object, but returning an object of the same shape as the original.
735
+ GroupBy object, but returning an object of the same shape as the original.
736
736
Passing ``as_index=False `` will not affect these transformation methods.
737
737
738
738
For example: ``fillna, ffill, bfill, shift. ``.
@@ -926,7 +926,7 @@ The dimension of the returned result can also change:
926
926
927
927
In [11]: grouped.apply(f)
928
928
929
- ``apply `` on a Series can operate on a returned value from the applied function,
929
+ ``apply `` on a Series can operate on a returned value from the applied function,
930
930
that is itself a series, and possibly upcast the result to a DataFrame:
931
931
932
932
.. ipython :: python
@@ -984,20 +984,48 @@ will be (silently) dropped. Thus, this does not pose any problems:
984
984
985
985
df.groupby(' A' ).std()
986
986
987
- Note that ``df.groupby('A').colname.std(). `` is more efficient than
987
+ Note that ``df.groupby('A').colname.std(). `` is more efficient than
988
988
``df.groupby('A').std().colname ``, so if the result of an aggregation function
989
- is only interesting over one column (here ``colname ``), it may be filtered
989
+ is only interesting over one column (here ``colname ``), it may be filtered
990
990
*before * applying the aggregation function.
991
991
992
+ .. _groupby.observed :
993
+
994
+ Handling of (un)observed Categorical values
995
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
996
+
997
+ When using a ``Categorical `` grouper (as a single or as part of multipler groupers), the ``observed `` keyword
998
+ controls whether to return a cartesian product of all possible groupers values (``observed=False ``) or only those
999
+ that are observed groupers (``observed=True ``).
1000
+
1001
+ Show all values:
1002
+
1003
+ .. ipython :: python
1004
+
1005
+ pd.Series([1 , 1 , 1 ]).groupby(pd.Categorical([' a' , ' a' , ' a' ], categories = [' a' , ' b' ]), observed = False ).count()
1006
+
1007
+ Show only the observed values:
1008
+
1009
+ .. ipython :: python
1010
+
1011
+ pd.Series([1 , 1 , 1 ]).groupby(pd.Categorical([' a' , ' a' , ' a' ], categories = [' a' , ' b' ]), observed = True ).count()
1012
+
1013
+ The returned dtype of the grouped will *always * include *all * of the catergories that were grouped.
1014
+
1015
+ .. ipython :: python
1016
+
1017
+ s = pd.Series([1 , 1 , 1 ]).groupby(pd.Categorical([' a' , ' a' , ' a' ], categories = [' a' , ' b' ]), observed = False ).count()
1018
+ s.index.dtype
1019
+
992
1020
.. _groupby.missing :
993
1021
994
1022
NA and NaT group handling
995
1023
~~~~~~~~~~~~~~~~~~~~~~~~~
996
1024
997
- If there are any NaN or NaT values in the grouping key, these will be
998
- automatically excluded. In other words, there will never be an "NA group" or
999
- "NaT group". This was not the case in older versions of pandas, but users were
1000
- generally discarding the NA group anyway (and supporting it was an
1025
+ If there are any NaN or NaT values in the grouping key, these will be
1026
+ automatically excluded. In other words, there will never be an "NA group" or
1027
+ "NaT group". This was not the case in older versions of pandas, but users were
1028
+ generally discarding the NA group anyway (and supporting it was an
1001
1029
implementation headache).
1002
1030
1003
1031
Grouping with ordered factors
@@ -1084,8 +1112,8 @@ This shows the first or last n rows from each group.
1084
1112
Taking the nth row of each group
1085
1113
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1086
1114
1087
- To select from a DataFrame or Series the nth item, use
1088
- :meth: `~pd.core.groupby.DataFrameGroupBy.nth `. This is a reduction method, and
1115
+ To select from a DataFrame or Series the nth item, use
1116
+ :meth: `~pd.core.groupby.DataFrameGroupBy.nth `. This is a reduction method, and
1089
1117
will return a single row (or no row) per group if you pass an int for n:
1090
1118
1091
1119
.. ipython :: python
@@ -1153,7 +1181,7 @@ Enumerate groups
1153
1181
.. versionadded :: 0.20.2
1154
1182
1155
1183
To see the ordering of the groups (as opposed to the order of rows
1156
- within a group given by ``cumcount ``) you can use
1184
+ within a group given by ``cumcount ``) you can use
1157
1185
:meth: `~pandas.core.groupby.DataFrameGroupBy.ngroup `.
1158
1186
1159
1187
@@ -1273,7 +1301,7 @@ Regroup columns of a DataFrame according to their sum, and sum the aggregated on
1273
1301
Multi- column factorization
1274
1302
~~~~~~~~~~~~~~~~~~~~~~~~~~
1275
1303
1276
- By using :meth:`~ pandas.core.groupby.DataFrameGroupBy.ngroup` , we can extract
1304
+ By using :meth:`~ pandas.core.groupby.DataFrameGroupBy.ngroup` , we can extract
1277
1305
information about the groups in a way similar to :func:`factorize` (as described
1278
1306
further in the :ref:`reshaping API < reshaping.factorize> ` ) but which applies
1279
1307
naturally to multiple columns of mixed type and different
0 commit comments