@@ -797,34 +797,47 @@ Assigning a ``Categorical`` to parts of a column of other types will use the val
797
797
df.dtypes
798
798
799
799
.. _categorical.merge :
800
+ .. _categorical.concat :
800
801
801
- Merging
802
- ~~~~~~~
802
+ Merging / Concatenation
803
+ ~~~~~~~~~~~~~~~~~~~~~~~
803
804
804
- You can concat two ``DataFrames `` containing categorical data together,
805
- but the categories of these categoricals need to be the same:
805
+ By default, combining ``Series `` or ``DataFrames `` which contain the same
806
+ categories results in ``category `` dtype, otherwise results will depend on the
807
+ dtype of the underlying categories. Merges that result in non-categorical
808
+ dtypes will likely have higher memory usage. Use ``.astype `` or
809
+ ``union_categoricals `` to ensure ``category `` results.
806
810
807
811
.. ipython :: python
808
812
809
- cat = pd.Series([" a" , " b" ], dtype = " category" )
810
- vals = [1 , 2 ]
811
- df = pd.DataFrame({" cats" : cat, " vals" : vals})
812
- res = pd.concat([df, df])
813
- res
814
- res.dtypes
813
+ from pandas.api.types import union_categoricals
815
814
816
- If the categories are not exactly the same, merging will coerce the
817
- categoricals to their categories' dtypes:
815
+ # same categories
816
+ s1 = pd.Series([' a' , ' b' ], dtype = ' category' )
817
+ s2 = pd.Series([' a' , ' b' , ' a' ], dtype = ' category' )
818
+ pd.concat([s1, s2])
819
+
820
+ # different categories
821
+ s3 = pd.Series([' b' , ' c' ], dtype = ' category' )
822
+ pd.concat([s1, s3])
823
+
824
+ pd.concat([s1, s3]).astype(' category' )
825
+ union_categoricals([s1.array, s3.array])
818
826
819
- .. ipython :: python
820
827
821
- df_different = df.copy()
822
- df_different[" cats" ].cat.categories = [" c" , " d" ]
823
- res = pd.concat([df, df_different])
824
- res
825
- res.dtypes
828
+ Following table summarizes the results of ``Categoricals `` related combinations.
826
829
827
- The same applies to ``df.append(df_different) ``.
830
+ +----------+--------------------------------------------------------+----------------------------+
831
+ | arg1 | arg2 | result |
832
+ +==========+========================================================+============================+
833
+ | category | category (identical categories) | category |
834
+ +----------+--------------------------------------------------------+----------------------------+
835
+ | category | category (different categories, both not ordered) | object (dtype is inferred) |
836
+ +----------+--------------------------------------------------------+----------------------------+
837
+ | category | category (different categories, either one is ordered) | object (dtype is inferred) |
838
+ +----------+--------------------------------------------------------+----------------------------+
839
+ | category | not category | object (dtype is inferred) |
840
+ +----------+--------------------------------------------------------+----------------------------+
828
841
829
842
See also the section on :ref: `merge dtypes<merging.dtypes> ` for notes about preserving merge dtypes and performance.
830
843
@@ -920,46 +933,6 @@ the resulting array will always be a plain ``Categorical``:
920
933
# "b" is coded to 0 throughout, same as c1, different from c2
921
934
c.codes
922
935
923
- .. _categorical.concat :
924
-
925
- Concatenation
926
- ~~~~~~~~~~~~~
927
-
928
- This section describes concatenations specific to ``category `` dtype. See :ref: `Concatenating objects<merging.concat> ` for general description.
929
-
930
- By default, ``Series `` or ``DataFrame `` concatenation which contains the same categories
931
- results in ``category `` dtype, otherwise results in ``object `` dtype.
932
- Use ``.astype `` or ``union_categoricals `` to get ``category `` result.
933
-
934
- .. ipython :: python
935
-
936
- # same categories
937
- s1 = pd.Series([' a' , ' b' ], dtype = ' category' )
938
- s2 = pd.Series([' a' , ' b' , ' a' ], dtype = ' category' )
939
- pd.concat([s1, s2])
940
-
941
- # different categories
942
- s3 = pd.Series([' b' , ' c' ], dtype = ' category' )
943
- pd.concat([s1, s3])
944
-
945
- pd.concat([s1, s3]).astype(' category' )
946
- union_categoricals([s1.array, s3.array])
947
-
948
-
949
- Following table summarizes the results of ``Categoricals `` related concatenations.
950
-
951
- +----------+--------------------------------------------------------+----------------------------+
952
- | arg1 | arg2 | result |
953
- +==========+========================================================+============================+
954
- | category | category (identical categories) | category |
955
- +----------+--------------------------------------------------------+----------------------------+
956
- | category | category (different categories, both not ordered) | object (dtype is inferred) |
957
- +----------+--------------------------------------------------------+----------------------------+
958
- | category | category (different categories, either one is ordered) | object (dtype is inferred) |
959
- +----------+--------------------------------------------------------+----------------------------+
960
- | category | not category | object (dtype is inferred) |
961
- +----------+--------------------------------------------------------+----------------------------+
962
-
963
936
964
937
Getting data in/out
965
938
-------------------
0 commit comments