Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributing Guide for Type Hints #27050

Merged
merged 15 commits into from
Aug 25, 2019
Merged

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Jun 26, 2019

Want to get this documented as we've been picking up momentum with type hints

@toobaz @TomAugspurger @vaibhavhrt @gwrome @makbigc

@WillAyd WillAyd added Docs Typing type annotations, mypy/pyright type checking labels Jun 26, 2019
c: Union[int, float]
) -> float:

When dealing with parameters with a default argument of ``None``, you should not use ``Optional`` as this will be inferred by the static type checker. So instead of:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback drawing your attention here as you might disagree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback we are going with Optional or not?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up.


.. code-block:: python

def some_func(a: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should make style recommendations that aren't enforced by a linter,

Black formats this as

def some_func(a: str, b: float, c: Union[int, float]) -> float:
    pass

And when a wrap is required

def some_func(
    a_long_name_really_long_name: str,
    b_long_name: float,
    c_long_name: Union[int, float],
) -> float:
    pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. Haven't used that yet (have seen other issue, will look into it) but what you are saying makes sense.

@topper-123 I think you suggested something like this as well so FYI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it fits within 79 chars, I don't the point to wrap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on requiring multiline defs to be wrapped in the style guide. It makes for much clearer code.

OTOH, if we go with Black, such choices will be made for us?

Pandas-specific Types
~~~~~~~~~~~~~~~~~~~~~

Commonly used types specific to *pandas* will appear in pandas._typing and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small list of the types of things found here may be helpful (ArrayLike, Dtype, etc.)

I think inlining an example here would be very helpful; early on, contributors may be adding to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes adding stuff in _typing.py sometimes confuses me. Some example will be nice.


from typing import Union

maybe_primes = [] # type: Union[int, None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it should be

Suggested change
maybe_primes = [] # type: Union[int, None]
maybe_primes = [] # type: List[Union[int, None]]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea great catch!

@codecov
Copy link

codecov bot commented Jun 26, 2019

Codecov Report

Merging #27050 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #27050      +/-   ##
==========================================
- Coverage   92.01%      92%   -0.01%     
==========================================
  Files         180      180              
  Lines       50754    50754              
==========================================
- Hits        46699    46695       -4     
- Misses       4055     4059       +4
Flag Coverage Δ
#multiple 90.64% <ø> (ø) ⬆️
#single 41.85% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/frame.py 96.89% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a7f1d69...ea8e560. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 26, 2019

Codecov Report

Merging #27050 into master will increase coverage by 0.17%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #27050      +/-   ##
==========================================
+ Coverage   91.86%   92.03%   +0.17%     
==========================================
  Files         180      180              
  Lines       50792    50714      -78     
==========================================
+ Hits        46658    46675      +17     
+ Misses       4134     4039      -95
Flag Coverage Δ
#multiple 90.67% <ø> (+0.18%) ⬆️
#single 41.86% <ø> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/arrays/integer.py 96.3% <0%> (-1.32%) ⬇️
pandas/core/internals/construction.py 95.95% <0%> (-0.8%) ⬇️
pandas/core/dtypes/cast.py 90.72% <0%> (-0.34%) ⬇️
pandas/core/arrays/sparse.py 94.19% <0%> (-0.31%) ⬇️
pandas/core/arrays/timedeltas.py 88.82% <0%> (-0.2%) ⬇️
pandas/core/groupby/groupby.py 97.17% <0%> (-0.16%) ⬇️
pandas/core/indexes/numeric.py 97.34% <0%> (-0.13%) ⬇️
pandas/core/frame.py 96.89% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.84% <0%> (-0.11%) ⬇️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 45ea267...0618be6. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, some comments

Syntax Requirements
~~~~~~~~~~~~~~~~~~~

Because *pandas* still supports Python 3.5, :pep:`526` does not apply and variables **must** be annotated with type comments. Specifically, this is a valid annotation within pandas:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this comment lower, e.g. show the common / easy case first


maybe_primes = [] # type: List[Optional[int]]

When dealing with parameters with a default argument of ``None``, you should not use ``Optional`` as this will be inferred by the static type checker. So instead of:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a code check for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to revert this section. Mypy supports this by default but it appears to be only for backwards compatibility and is considered a "mistake" in PEP 484

python/typeshed#1420

So separately can look into enabling --no-implicit-optional to see what breaks


Commonly used types specific to *pandas* will appear in pandas._typing and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas.

For example, quite a few functions in *pandas* accept a ``dtype`` argument. This can be expressed as a string like ``"object"``, a numpy.dtype like ``np.int64`` or even a pandas ``ExtensionDtype`` like ``pd.CategoricalDtype``. Rather than burden the user with having to constantly annotate all of those options, this can simply be imported and reused from the pandas._typing module
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use a ref to numpy.dtype

Pandas-specific Types
~~~~~~~~~~~~~~~~~~~~~

Commonly used types specific to *pandas* will appear in pandas._typing and you should use these where applicable. This module is private for now but ultimately this should be exposed to third party libraries who want to implement type checking against pandas.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put a reference to pandas._typing : https://github.com/pandas-dev/pandas/blob/master/pandas/_typing.py

@jreback jreback added this to the 0.25.0 milestone Jun 27, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @jorisvandenbossche comments?

if you can add in a followup (or this one ok too), how to deal with circular imports; pretty much use the string form of the import; also indicate that we generally need to import fully qualified, e.g. from pandas.core.frame import DataFrame

@WillAyd
Copy link
Member Author

WillAyd commented Jun 28, 2019

Would prefer that piece as follow up. I have somewhat of a concern that that might not actually the be correct solution as it is not suggested by MyPy (refs like that only suggested within one module). It might just be replacing those with Any

I could be wrong but would like more time to investigate

@jreback
Copy link
Contributor

jreback commented Jun 28, 2019

lgtm. @jorisvandenbossche any comments?

@WillAyd
Copy link
Member Author

WillAyd commented Jun 30, 2019 via email


.. code-block:: shell

mypy pandas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also type check single files or submodules? (for quicker development turnover, if you are trying out type checking whole pandas takes a while)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could but not generically useful as mypy doggedly follows all imports so wouldn't necessarily save much time:

https://mypy.readthedocs.io/en/latest/running_mypy.html#following-imports

If type checking speed is a concern the suggested approach is to use a daemon:

https://mypy.readthedocs.io/en/latest/mypy_daemon.html#mypy-daemon-mypy-server

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche yes you can do something like this: mypy pandas/core/something.py, or mypy pandas/core/generic
It saves a bit of time but not much, but adding how to do that in the contributing guide might not be a bad idea. Personally when I am working with typing I run mypy for just the single file I am working on. For whole project I run it only before committing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t plan on adding this - it’s not value added to do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's true, someone can easily figure out the command for a single file/folder by making a wise guess or going to mypy docs.

Co-Authored-By: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche jorisvandenbossche removed this from the 0.25.0 milestone Jul 3, 2019
class SomeClass2:
str = None # type: str_type

In some cases you may be tempted to use ``cast`` from the typing module when you know better than the analyzer. This occurs particularly when using custom inference functions. For example
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche let me know if this helps with understanding of cast

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it also an option to just "leave it" for some cases? Or does mypy error if it cannot infer a type?

Because I have the feeling we have a lot of such cases, we use those is_.. idioms almost everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what you mean by “leave it” but once the annotation is added in the signature here any op which assumes a string but which Mypy can’t narrow inference down to will raise (here it would say something like int/float has no attribute “upper”)

For sure though I think we will have a few places in the code base where cast would be required, at least unless the referenced Mypy enhancement is implemented

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I basically meant "leave it untyped" (as you can also leave complete functions untyped). So but you answered: once you type the signature of a function, mypy needs to understand the full body of that function.

I think we will have a few places in the code base where cast would be required

I don't have the feeling it is "a few". We do such is_ calls much more than actual isinstance checks.

at least unless the referenced Mypy enhancement is implemented

There doesn't seem to be much movement in that issue at the moment?

@WillAyd
Copy link
Member Author

WillAyd commented Jul 11, 2019

@jorisvandenbossche do you have any other edits you want to make to this? Would like to get this in dev guide for users to see

@WillAyd WillAyd mentioned this pull request Jul 22, 2019
5 tasks
@WillAyd
Copy link
Member Author

WillAyd commented Aug 8, 2019

Ping @jorisvandenbossche

@TomAugspurger
Copy link
Contributor

Let's merge this today and do the rest as followups? I suspect this is an improvement from master.

Style Guidelines
~~~~~~~~~~~~~~~~

Types imports should follow the ``from typing import ...`` convention. So rather than
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add a codecheck for this?

@WillAyd WillAyd added this to the 1.0 milestone Aug 25, 2019
@WillAyd
Copy link
Member Author

WillAyd commented Aug 25, 2019

Merging as is from last comment. Anything else here can be done as follow ups

@WillAyd WillAyd merged commit 97f9bbf into pandas-dev:master Aug 25, 2019
@WillAyd WillAyd deleted the document-mypy branch August 25, 2019 16:05
@simonjayhawkins
Copy link
Member

Thanks @WillAyd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants