Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Export (a subset of?) pandas._typing for type checking #55231

Open
1 of 3 tasks
caneff opened this issue Sep 21, 2023 · 15 comments
Open
1 of 3 tasks

ENH: Export (a subset of?) pandas._typing for type checking #55231

caneff opened this issue Sep 21, 2023 · 15 comments
Labels
Enhancement Typing type annotations, mypy/pyright type checking

Comments

@caneff
Copy link
Contributor

caneff commented Sep 21, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

There are public functions and methods whose arguments are types that are not actually exported. This makes it hard to propogate those types to other functions that then call the pandas ones.

For instance, merge has a how argument that has a type _typing.MergeHow = Literal['cross', 'inner', 'left', 'outer', 'right'], but since _typing is protected, there is no good way to take it as an argument and instead I have to say

def foo(df: pd.DataFrame, ...., how: str):
  ...
  assert how in ['cross', 'inner', 'left', 'outer', 'right']
  pd.merge(..., how=how)

For my type checker to be OK with it. This is both annoyingly verbose and fragile to updates of Pandas

Feature Description

Add a typing module that exposes a (possible subset) of _typing.

I say possible subset because from looking at the _typing module there are clearly types that are internal usage only and I'm guessing we don't want to have them public so that they can be changed easier.

I would propose the subset be all types that are used as arguments of public functions and methods.

This way my function above could have;

import pandas as pd
import pandas.typing as pd_typing
def foo(df: pd.DataFrame, ..., how: pd_typing.MergeHow):
   ...
   pd.merge(...., how=how)

and have everything work.

Alternative Solutions

Technically these types are "available" when imported by other modules, so you can access MergeHow via pandas.core.reshape.merge.MergeHow or pandas.core.frame.MergeHow but those are just imports from _typing imported to be used by those modules themselves, not something users should rely on.

Other alternatives

A) Split the public ones out of _typing into typing, could from typing import * in _typing if we don't want to rewrite everywhere the newly public types are used.

B) Just make all of typing public. As someone who is not heavy into Pandas internals I have no strong opinion here but my guess is that there are internal types that we don't want public.

Additional Context

I'm more than willing to take this PR myself I just want feedback about whether this would be accepted.

@caneff caneff added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 21, 2023
@twoertwein
Copy link
Member

A subset of _typing exposed in #48577

@caneff
Copy link
Contributor Author

caneff commented Sep 22, 2023 via email

@twoertwein
Copy link
Member

I think pandas.api.typing currently contains mostly classes (they are definitely stable) but I'm honestly not sure how stable the type aliases in pandas._typing are (both their name and value).

We also have pandas-stubs (but you can obviously not import from there).

@Dr-Irv @rhshadrach

@twoertwein twoertwein added the Typing type annotations, mypy/pyright type checking label Sep 22, 2023
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Sep 22, 2023

Not sure why you can't just use from pandas._typing import MergeHow to address your current issue.

Alternatively, we could have a pandas/typing.py for the ones we are willing to export (or maybe put all of those in pandas.api.typing) and keep pandas/_typing.py for internal usage.

There is also the issue of creating the documentation for things like MergeHow. If we put them in a public place (whether pandas.typing or pandas.api.typing, then they would need to be documented, so that changing the names of the types (if that should ever happen) would be known to the public.

So, this isn't just an issue of whether we start exposing these literals, it's also a documentation issue.

@caneff
Copy link
Contributor Author

caneff commented Sep 22, 2023 via email

@rhshadrach
Copy link
Member

I think pandas.api.typing would make sense for this, and find having pandas.typing alongside pandas.api.typing a bit confusing.

But the main concern is with evolution of type hints alongside the library. How do we go about changing a name or removing a type-hint without breaking user code? If type-hints were strings, then I don't think we'd be at risk. For example:

from __future__ import annotations


def foo(x) -> bar:
    return x + 1

foo(1)

is valid Python. But they currently aren't by default and without the future import this code fails. I don't believe we have any way to signal to a user that a type-hint is deprecated or going to change.

@caneff
Copy link
Contributor Author

caneff commented Sep 23, 2023 via email

@twoertwein
Copy link
Member

twoertwein commented Sep 23, 2023

Two thoughts:

  • We could separate the stable pandas.api.typing and create a new pandas.api.typing.aliases that is documented to be public but NOT stable (names might change without warnings between releases and the values too) - typing pandas is already slow, I would like to avoid more friction (adding deprecation/future warnings for typing changes)
  • Most people probably do not use the panda's internal annotations (only bare pyright people or people who specifically add a py.typed), pylance and presumably most mypy people use pandas-stubs: if we expose type aliases, we need to ensure that they are in sync between pandas and pandas-stubs!

@caneff
Copy link
Contributor Author

caneff commented Sep 23, 2023 via email

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Sep 23, 2023

  1. At Google we haven't adopted stubs yet (but it so on my list). We depend on the bare types ATM and given our size of codebase it will be a slow transition.

I think you will find that using pandas-stubs will help improve your code, because the typing checks a lot more things than what is available with just using the library's typing, which is really meant for checking the internal code consistency.

Do pandas-stubs not just import from typing anyway? Isn't that how we keep things in sync?

See https://github.com/pandas-dev/pandas-stubs#differences-between-type-declarations-in-pandas-and-pandas-stubs

@caneff
Copy link
Contributor Author

caneff commented Sep 23, 2023 via email

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Sep 26, 2023

Yes I agree. But with the scale of our codebase makes it daunting to say the least. Any bit of new typing results in a lot of new typing violations to be handled. Any suggestions for how to maybe introduce them piecewise?

With mypy and pyright, I am pretty sure that you can specify the files to include/exclude for type checking.

Also how would the stubs help with my current issue though? Do the stubs intentionally export MergeHow anywhere?

I think you could do something like this:

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from pandas._typing import MergeHow

def foo(df: pd.DataFrame, ..., how: "MergeHow"):
   ...
   pd.merge(...., how=how)

@lithomas1 lithomas1 removed the Needs Triage Issue that has not been reviewed by a pandas team member label Sep 26, 2023
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 3, 2025

@WillAyd Back in 2019, you added this text in #27050 (slightly edited since then) that now appears at https://pandas.pydata.org/pandas-docs/stable/development/contributing_codebase.html#pandas-specific-types that says:

"Commonly used types specific to pandas will appear in pandas._typing and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas."

pyright has made a change that now considers any module with a leading underscore to be considered private. So when someone installs pandas-stubs, pyright reports that there is an import of a private module if someone imports any of the types in pandas._typing and uses pandas-stubs.

Ideally, we would move more of the "public" types into a module and document them, and have a correspondence between both pandas and pandas-stubs for the publicly supported types. Because of the change in pyright, this has now become more important. See this comment from the pyright developer here: microsoft/pyright#10248 (comment) where he asked how long do we think it would take to make that change.

One quick option is we don't worry about the documentation, and just make pandas.typing a non-private module. But then we also have pandas.api.typing so that could be confusing.

In any case, I think we need to now handle the sentence you wrote back then: " This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas."

Thoughts? Who else should we include in this conversation? (Adding @rhshadrach to get his thoughts)

@rhshadrach
Copy link
Member

What are the objects from pandas._typing that users would want to use in practice?

One quick option is we don't worry about the documentation, and just make pandas.typing a non-private module. But then we also have pandas.api.typing so that could be confusing.

Then any future changes to the objects in pandas.typing could potentially break user type-hinting, and I believe there is very limited ways to support deprecations (is this correct?). Instead of making changes to the types there, we would have to just use new ones?

Would we be aligned to "usage of pandas type-hinting objects is subject to breakage in even minor versions of pandas". Would users?

I'm also negative on having both pandas.typing and pandas.api.typing.

@rhshadrach
Copy link
Member

rhshadrach commented Apr 6, 2025

Reading over the linked pyright issue, people importing from a private module are breaking Python conventions. They are free to do so at their own peril, but it is not onerous to add a type: ignore in such situations. However I do feel strongly that neither pandas._typing nor the equivalent in pandas-stubs should be considered public as-is, which I think is effectively what is being suggested at the bottom of microsoft/pyright#10248 (comment).

That said, I'm not necessarily adverse to exposing some subset in a public location, but the real-world utility should be demonstrated first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

No branches or pull requests

5 participants