Skip to content

DEPR (PDEP-7/CoW): deprecate and remove copy keyword (except in constructors) #56022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #48998
jorisvandenbossche opened this issue Nov 17, 2023 · 9 comments
Assignees
Labels
Copy / view semantics Deprecate Functionality to remove in pandas
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 17, 2023

PDEP-7 did not spell it out explicitly, but a consequence of Copy-on-Write is that the copy keyword is no longer very useful.

Currently a bunch of methods have this keyword (astype, rename, reindex, ..., full list at #50535), for example:

>>> df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
>>> df2 = df.rename(columns=str.upper)  # has a default of `copy=True`

With the current default behaviour df2 is a full copy of df. This default of copy=True will change to no longer copy when CoW is enabled (but act as "delayed" copy). Users could nowadays use copy=False to avoid the full copy, but this will no longer be possible with CoW (the previous concept of "shallow copy" no longer exists, xref #36195 (comment)). So passing copy=False is something we will have to deprecate anyhow.

In theory we could keep copy=True as a non-default option, which would result in an actual hard copy instead of the CoW-tracked view. However, in #50535, we essentially already decided to not do this, and in the CoW mode currently a copy=True is simply ignored.
The idea is that if a user really wants a hard copy, they can add a .copy() in the chain (e.g. df2 = df.rename(..).copy() instead of df2 = df.rename(..., copy=True). But so in #50535 we felt that it was not worth to keep a whole keyword for such minor use case which has a clear and easy alternative.

So the consequence of the current behaviour with CoW enabled is that we can deprecate the copy keyword altogether. The idea is that we can already start doing this slowly with a DeprecationWarning in pandas 2.2, which at the same time can point people to enable CoW in the warning message as alternative.

While it's a consequence of the CoW behaviour changes, it's still deprecating a keyword in 15+ methods, so opening this issue for visibility. cc @pandas-dev/pandas-core

@jorisvandenbossche jorisvandenbossche added Deprecate Functionality to remove in pandas Copy / view semantics labels Nov 17, 2023
@jorisvandenbossche jorisvandenbossche added this to the 2.2 milestone Nov 17, 2023
@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Nov 17, 2023

To be explicit, this issue does not cover the constructors (e.g. Series(..., copy=True/False), DataFrame(..., copy=True/False)).

Copying over the list of affected methods from #50535:

  • align
  • astype
  • infer_objects
  • merge
  • reindex
  • reindex_like
  • rename
  • rename_axis
  • set_axis
  • set_flags (default False)
  • swapaxes
  • swaplevel
  • to_numpy (default False) -> this can keep it?
  • to_timestamp
  • transpose (default False)
  • truncate
  • tz_convert
  • tz_localize
  • pd.concat

@phofl
Copy link
Member

phofl commented Nov 17, 2023

I have a work in progress branch

@lithomas1
Copy link
Member

We are also tracking the deprecation of unnecessary inplace keywords here, right?

@jorisvandenbossche
Copy link
Member Author

No, inplace has a separate (not yet approved) PDEP

@phofl
Copy link
Member

phofl commented Nov 17, 2023

Yeah let’s keep inplace in the pdep

@phofl
Copy link
Member

phofl commented Nov 22, 2023

Changed my mind here, lets do for 3.0, we would have to work around our internal usages which is a PITA

@phofl phofl modified the milestones: 2.2, 3.0 Nov 22, 2023
@rhshadrach
Copy link
Member

Changed my mind here, lets do for 3.0

Are you saying to remove the copy keyword without deprecation?

@phofl
Copy link
Member

phofl commented Nov 24, 2023

Oh no, that would be insane

no we wanted to add a deprecation warning before the future warning and I want to push that back
To 3.0, removal is only in 4.0 anyway

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Nov 27, 2023

Sounds good. We can still start with a DeprecationWarning in 3.0

(we can add those after CoW has been enabled on main after 2.2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics Deprecate Functionality to remove in pandas
Projects
None yet
Development

No branches or pull requests

4 participants