-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Export (a subset of?) pandas._typing for type checking #55231
Comments
A subset of _typing exposed in #48577 |
Ok so any blocker to me exposing some of the literals in that same place
then? Like MergeHow and CorrelationMethod are the two we have run into the
most at Google when trying to upgrade to pandas 2 but I'm sure I can find
others looking through that file that make sense in the same way.
…On Fri, Sep 22, 2023, 8:10 AM Torsten Wörtwein ***@***.***> wrote:
A subset of _typing exposed in #48577
<#48577>
—
Reply to this email directly, view it on GitHub
<#55231 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABG4SH6X34LA4TPXXBBUE5TX3V53HANCNFSM6AAAAAA5B3KDFE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I think We also have pandas-stubs (but you can obviously not import from there). |
Not sure why you can't just use Alternatively, we could have a There is also the issue of creating the documentation for things like So, this isn't just an issue of whether we start exposing these literals, it's also a documentation issue. |
`pandas._typing` isn't a solution because it is protected and not
expected to be imported from other packages. It is never a good idea to
break that agreement, especially for functionality this mundane (type
checking of inputs to already public methods).
If we are exposing types through arguments of public methods, we should be
able to access those types in a way that isn't through whatever
`pandas.core` file happened to import them, or by `typing.get_type_hints`
or the like.
I'm fine adding documentation mentioned as well if that is the only
objection.
…On Fri, Sep 22, 2023, 9:53 AM Irv Lustig ***@***.***> wrote:
Not sure why you can't just use from pandas._typing import MergeHow to
address your current issue.
Alternatively, we could have a pandas/typing.py for the ones we are
willing to export (or maybe put all of those in pandas.api.typing) and
keep pandas/_typing.py for internal usage.
There is also the issue of creating the documentation for things like
MergeHow. If we put them in a public place (whether pandas.typing or
pandas.api.typing, then they would need to be documented, so that
changing the names of the types (if that should ever happen) would be known
to the public.
So, this isn't just an issue of whether we start exposing these literals,
it's also a documentation issue.
—
Reply to this email directly, view it on GitHub
<#55231 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABG4SHY3UMFXHWIKKILIMS3X3WJ4NANCNFSM6AAAAAA5B3KDFE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I think But the main concern is with evolution of type hints alongside the library. How do we go about changing a name or removing a type-hint without breaking user code? If type-hints were strings, then I don't think we'd be at risk. For example:
is valid Python. But they currently aren't by default and without the future import this code fails. I don't believe we have any way to signal to a user that a type-hint is deprecated or going to change. |
I agree with that sentiment but presumably it is equivalent even if we
don't export these. If you are going to widen the literals to more values,
then some code that had actually copied the literal definition of or
someone importing from _typing is the same. If you are reducing it to fewer
values, then they will break the same way and can be deprecated by
detecting usage of the value basically. Not exposing them saves a
deprecation only on the case of a rename, to the expense of no one else
being able to access it without violating the protectedness of _typing.
I agree with your point but it is a different one from my original question
…On Fri, Sep 22, 2023, 8:28 PM Richard Shadrach ***@***.***> wrote:
I think pandas.api.typing would make sense for this, and find having
pandas.typing alongside pandas.api.typing a bit confusing.
But the main concern is with evolution of type hints alongside the
library. How do we go about changing a name or removing a type-hint without
breaking user code? If type-hints were strings, then I don't think we'd be
at risk. For example:
from __future__ import annotations
def foo(x) -> bar:
return x + 1
foo(1)
is valid Python. But they currently aren't by default and without the
future import this code fails. I don't believe we have any way to signal to
a user that a type-hint is deprecated or going to change.
—
Reply to this email directly, view it on GitHub
<#55231 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABG4SH2IREQIVCYWAXFQQW3X3YULXANCNFSM6AAAAAA5B3KDFE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Two thoughts:
|
1. That's acceptable to me.
2. At Google we haven't adopted stubs yet (but it so on my list). We depend
on the bare types ATM and given our size of codebase it will be a slow
transition.
Do pandas-stubs not just import from typing anyway? Isn't that how we keep
things in sync?
…On Sat, Sep 23, 2023, 10:56 AM Torsten Wörtwein ***@***.***> wrote:
Two thoughts:
- We could separate the stable panas.api.typing and create a new
pandas.api.typing.aliases that is documented to be public but NOT
stable (names might change without warnings between releases and the values
too) - typing pandas is already slow, I would like to avoid more friction
(adding deprecation/future warnings for typing changes)
- Most people probably do not use the panda's internal annotations
(only bare pyright people or people who specifically add a py.typed),
pylance and presumably most mypy people use pandas-stubs: if we expose type
aliases, we need to ensure that they are in sync between pandas and
pandas-stubs!
—
Reply to this email directly, view it on GitHub
<#55231 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABG4SHYNLHFP7ISSUUSK47DX332BJANCNFSM6AAAAAA5B3KDFE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I think you will find that using
|
Yes I agree. But with the scale of our codebase makes it daunting to say
the least. Any bit of new typing results in a lot of new typing violations
to be handled. Any suggestions for how to maybe introduce them piecewise?
Also how would the stubs help with my current issue though? Do the stubs
intentionally export MergeHow anywhere?
|
With
I think you could do something like this: from typing import TYPE_CHECKING
if TYPE_CHECKING:
from pandas._typing import MergeHow
def foo(df: pd.DataFrame, ..., how: "MergeHow"):
...
pd.merge(...., how=how) |
@WillAyd Back in 2019, you added this text in #27050 (slightly edited since then) that now appears at https://pandas.pydata.org/pandas-docs/stable/development/contributing_codebase.html#pandas-specific-types that says: "Commonly used types specific to pandas will appear in pandas._typing and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas."
Ideally, we would move more of the "public" types into a module and document them, and have a correspondence between both One quick option is we don't worry about the documentation, and just make In any case, I think we need to now handle the sentence you wrote back then: " This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas." Thoughts? Who else should we include in this conversation? (Adding @rhshadrach to get his thoughts) |
What are the objects from
Then any future changes to the objects in Would we be aligned to "usage of pandas type-hinting objects is subject to breakage in even minor versions of pandas". Would users? I'm also negative on having both |
Reading over the linked pyright issue, people importing from a private module are breaking Python conventions. They are free to do so at their own peril, but it is not onerous to add a That said, I'm not necessarily adverse to exposing some subset in a public location, but the real-world utility should be demonstrated first. |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
There are public functions and methods whose arguments are types that are not actually exported. This makes it hard to propogate those types to other functions that then call the pandas ones.
For instance,
merge
has ahow
argument that has a type_typing.MergeHow = Literal['cross', 'inner', 'left', 'outer', 'right']
, but since_typing
is protected, there is no good way to take it as an argument and instead I have to sayFor my type checker to be OK with it. This is both annoyingly verbose and fragile to updates of Pandas
Feature Description
Add a
typing
module that exposes a (possible subset) of_typing
.I say possible subset because from looking at the
_typing
module there are clearly types that are internal usage only and I'm guessing we don't want to have them public so that they can be changed easier.I would propose the subset be all types that are used as arguments of public functions and methods.
This way my function above could have;
and have everything work.
Alternative Solutions
Technically these types are "available" when imported by other modules, so you can access
MergeHow
viapandas.core.reshape.merge.MergeHow
orpandas.core.frame.MergeHow
but those are just imports from_typing
imported to be used by those modules themselves, not something users should rely on.Other alternatives
A) Split the public ones out of
_typing
intotyping
, couldfrom typing import *
in_typing
if we don't want to rewrite everywhere the newly public types are used.B) Just make all of
typing
public. As someone who is not heavy into Pandas internals I have no strong opinion here but my guess is that there are internal types that we don't want public.Additional Context
I'm more than willing to take this PR myself I just want feedback about whether this would be accepted.
The text was updated successfully, but these errors were encountered: