Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Add details of dropna in DataFrame.pivot_table #61184

Merged
merged 20 commits into from
Apr 2, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions pandas/core/reshape/pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,9 @@ def pivot_table(
on the rows and columns.
dropna : bool, default True
Do not include columns whose entries are all NaN. If True,
rows with a NaN value in any column will be omitted before
computing margins.
- rows with a NaN value in any column will be omitted before computing margins,
- index/column keys containing NA values will be dropped (see ``dropna``
parameter in ``pandas.DataFrame.groupby``).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also need to update the entry in shared_docs within pandas.core.frame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use an NA value instead of a NaN value. Also link to the groupby via :meth:`DataFrame.groupby` instead of pandas.DataFrame.groupby.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's :meth: and not :method: for sphinx.

margins_name : str, default 'All'
Name of the row / column that will contain the totals
when margins is True.
Expand Down
57 changes: 57 additions & 0 deletions pandas/tests/reshape/test_pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -2529,6 +2529,63 @@ def test_pivot_table_aggfunc_nunique_with_different_values(self):

tm.assert_frame_equal(result, expected)

def test_pivot_table_index_and_column_keys_with_nan(self, dropna: bool) -> None:
"""Index/column keys containing NA values depend on ``dropna``.

Input data
----------
row col val
0 NaN 0.0 0
1 0.0 1.0 1
2 1.0 2.0 2
3 2.0 3.0 3
4 3.0 NaN 4

``dropna=True``: Index/column keys containing NA values will be dropped.
Expected output
---------------
col 1.0 2.0 3.0
row
0.0 1.0 NaN NaN
1.0 NaN 2.0 NaN
2.0 NaN NaN 3.0

``dropna=False``: Index/columns should exist if any non-null values.
Expected output
---------------
col 0.0 1.0 2.0 3.0 NaN
row
0.0 NaN 1.0 NaN NaN NaN
1.0 NaN NaN 2.0 NaN NaN
2.0 NaN NaN NaN 3.0 NaN
3.0 NaN NaN NaN NaN 4.0
NaN 0.0 NaN NaN NaN NaN
"""
data = {"row": [None, *range(4)], "col": [*range(4), None], "val": range(5)}
df = DataFrame(data)
actual = df.pivot_table(values="val", index="row", columns="col", dropna=dropna)
e_index = [None, *range(4)]
e_columns = [*range(4), None]
e_data = np.zeros(shape=(5, 5))
e_data.fill(np.nan)
np.fill_diagonal(a=e_data, val=range(5))
expected = DataFrame(
data=e_data,
index=Index(data=e_index, name="row"),
columns=Index(data=e_columns, name="col"),
).sort_index()
if dropna:
expected = (
# Drop null index/column keys.
expected.loc[expected.index.notna(), expected.columns.notna()]
# Drop null rows.
.dropna(axis="index", how="all")
# Drop null columns.
.dropna(axis="columns", how="all")
)

tm.assert_frame_equal(left=actual, right=expected)


class TestPivot:
def test_pivot(self):
Expand Down
Loading