Skip to content

Commit 9fb6d67

Browse files
committed
DOC: Combine concat/ merge sections for categoricals
1 parent 1b9bb12 commit 9fb6d67

File tree

1 file changed

+32
-59
lines changed

1 file changed

+32
-59
lines changed

doc/source/user_guide/categorical.rst

+32-59
Original file line numberDiff line numberDiff line change
@@ -797,34 +797,47 @@ Assigning a ``Categorical`` to parts of a column of other types will use the val
797797
df.dtypes
798798
799799
.. _categorical.merge:
800+
.. _categorical.concat:
800801

801-
Merging
802-
~~~~~~~
802+
Merging / Concatenation
803+
~~~~~~~~~~~~~~~~~~~~~~~
803804

804-
You can concat two ``DataFrames`` containing categorical data together,
805-
but the categories of these categoricals need to be the same:
805+
By default, combining ``Series`` or ``DataFrames`` which contain the same
806+
categories results in ``category`` dtype, otherwise results will depend on the
807+
dtype of the underlying categories. Merges that result in non-categorical
808+
dtypes will likely have higher memory usage. Use ``.astype`` or
809+
``union_categoricals`` to ensure ``category`` results.
806810

807811
.. ipython:: python
808812
809-
cat = pd.Series(["a", "b"], dtype="category")
810-
vals = [1, 2]
811-
df = pd.DataFrame({"cats": cat, "vals": vals})
812-
res = pd.concat([df, df])
813-
res
814-
res.dtypes
813+
from pandas.api.types import union_categoricals
815814
816-
If the categories are not exactly the same, merging will coerce the
817-
categoricals to their categories' dtypes:
815+
# same categories
816+
s1 = pd.Series(['a', 'b'], dtype='category')
817+
s2 = pd.Series(['a', 'b', 'a'], dtype='category')
818+
pd.concat([s1, s2])
819+
820+
# different categories
821+
s3 = pd.Series(['b', 'c'], dtype='category')
822+
pd.concat([s1, s3])
823+
824+
pd.concat([s1, s3]).astype('category')
825+
union_categoricals([s1.array, s3.array])
818826
819-
.. ipython:: python
820827
821-
df_different = df.copy()
822-
df_different["cats"].cat.categories = ["c", "d"]
823-
res = pd.concat([df, df_different])
824-
res
825-
res.dtypes
828+
Following table summarizes the results of ``Categoricals`` related combinations.
826829

827-
The same applies to ``df.append(df_different)``.
830+
+----------+--------------------------------------------------------+----------------------------+
831+
| arg1 | arg2 | result |
832+
+==========+========================================================+============================+
833+
| category | category (identical categories) | category |
834+
+----------+--------------------------------------------------------+----------------------------+
835+
| category | category (different categories, both not ordered) | object (dtype is inferred) |
836+
+----------+--------------------------------------------------------+----------------------------+
837+
| category | category (different categories, either one is ordered) | object (dtype is inferred) |
838+
+----------+--------------------------------------------------------+----------------------------+
839+
| category | not category | object (dtype is inferred) |
840+
+----------+--------------------------------------------------------+----------------------------+
828841

829842
See also the section on :ref:`merge dtypes<merging.dtypes>` for notes about preserving merge dtypes and performance.
830843

@@ -920,46 +933,6 @@ the resulting array will always be a plain ``Categorical``:
920933
# "b" is coded to 0 throughout, same as c1, different from c2
921934
c.codes
922935
923-
.. _categorical.concat:
924-
925-
Concatenation
926-
~~~~~~~~~~~~~
927-
928-
This section describes concatenations specific to ``category`` dtype. See :ref:`Concatenating objects<merging.concat>` for general description.
929-
930-
By default, ``Series`` or ``DataFrame`` concatenation which contains the same categories
931-
results in ``category`` dtype, otherwise results in ``object`` dtype.
932-
Use ``.astype`` or ``union_categoricals`` to get ``category`` result.
933-
934-
.. ipython:: python
935-
936-
# same categories
937-
s1 = pd.Series(['a', 'b'], dtype='category')
938-
s2 = pd.Series(['a', 'b', 'a'], dtype='category')
939-
pd.concat([s1, s2])
940-
941-
# different categories
942-
s3 = pd.Series(['b', 'c'], dtype='category')
943-
pd.concat([s1, s3])
944-
945-
pd.concat([s1, s3]).astype('category')
946-
union_categoricals([s1.array, s3.array])
947-
948-
949-
Following table summarizes the results of ``Categoricals`` related concatenations.
950-
951-
+----------+--------------------------------------------------------+----------------------------+
952-
| arg1 | arg2 | result |
953-
+==========+========================================================+============================+
954-
| category | category (identical categories) | category |
955-
+----------+--------------------------------------------------------+----------------------------+
956-
| category | category (different categories, both not ordered) | object (dtype is inferred) |
957-
+----------+--------------------------------------------------------+----------------------------+
958-
| category | category (different categories, either one is ordered) | object (dtype is inferred) |
959-
+----------+--------------------------------------------------------+----------------------------+
960-
| category | not category | object (dtype is inferred) |
961-
+----------+--------------------------------------------------------+----------------------------+
962-
963936
964937
Getting data in/out
965938
-------------------

0 commit comments

Comments
 (0)