Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STYLE pandas-dev flake8 plugin #40826

Closed
MarcoGorelli opened this issue Apr 7, 2021 · 11 comments · Fixed by #40906
Closed

STYLE pandas-dev flake8 plugin #40826

MarcoGorelli opened this issue Apr 7, 2021 · 11 comments · Fixed by #40906
Labels
Code Style Code style, linting, code_checks Needs Discussion Requires discussion from core team before further action
Milestone

Comments

@MarcoGorelli
Copy link
Member

Currently, there are a lot of style checks which are specific to this repo, the result of which is a ton of CI checks and occasionally quite a bit of extra code.

The main repo is already huge, so I'd like to suggest making a pandas-dev/pandas-style-guide repo, which would be a flake8 plugin, with all the custom linting rules we currently have.

Some advantages:

  • the current gigantic repo would become a bit smaller, as many style checks could be moved over to the pandas-dev/pandas-style-guide one
  • the plugin could easily be tested on multiple OSs and versions of Python, which is currently difficult with all the stylistic checks living in the main repo. The ast module often changes a bit between versions, so it would be good to test the ast-based checks on different Python versions
  • linting failures would be reported with the familiar flake8 syntax we're all accustomed to by now

One disadvantage I see is that there might be less visibility if this is in a separate repo.

@pandas-dev/pandas-core @pandas-dev/pandas-triage anyone have any thoughts/objections?

@MarcoGorelli MarcoGorelli added Code Style Code style, linting, code_checks Needs Discussion Requires discussion from core team before further action labels Apr 7, 2021
@WillAyd
Copy link
Member

WillAyd commented Apr 7, 2021 via email

@datapythonista
Copy link
Member

Can you be more specific on what are the code checks that you'd like to move?

For the grep commands in ci/code_checks.py I think having a separate repo is not worth.

For scripts/validate_* to me it makes more sense to move them as separate repos, but making them generic, so they can be reused by any other project. I think it should be doable. And customize in this repo what's specific to pandas. Not opposed to move the scripts as they are, but I don't personally see the advantage is they can't be reused. There are many other things that I'd move to separate repos first.

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Apr 7, 2021

Can you be more specific on what are the code checks that you'd like to move?

Mainly the ones in scripts - some of the grep ones may also be more robust as part of an AST traversal than with a regex

Also, future ast-based scripts. pandas is using more and more typing, and I expect that this will require futher checks that regex won't scale to but which won't be generic enough to be used outside of pandas

I don't personally see the advantage is they can't be reused

The big advantage I see is the second bullet-point I listed above about testing on multiple OSs and Python versions. Decluttering the main repo would also be nice.


I think this would be great to try out

Thanks - I'll put something together then and we can take the discussion from there, if people don't like it then that's fine, I can bin it 😄

@jbrockmendel
Copy link
Member

jbrockmendel commented Apr 7, 2021 via email

@jorisvandenbossche
Copy link
Member

Being able to test the style checking against different Python versions / OSs sounds like a good reason to use a separate repo.

Practical question: how do you see the syncing between both repo's (as right now I suppose often a change in the scripts and the related change in the pandas code itself happen in the same PR)? The separate repo has a version, and that version gets bumped from time to time in pandas' CI?

One naming related comment: I wouldn't call it like pandas-style-guide, as to me that sounds to be about a style guide for writing pandas code (i.e. using pandas), not about implementing pandas. Alternatively it could be something like pandas-dev-linting

@dsaxton
Copy link
Member

dsaxton commented Apr 7, 2021

Nice idea @MarcoGorelli. I'm not really sure how flake8 plugins work but would it still be possible to upstream the lint checks while keeping in the primary pandas-dev repo? In a vacuum keeping things centralized into more of a "monorepo" feels better to me as very often it would seem necessary to cross reference between the style checks and main code base. Also if there is a bug in the linter causing issues in pandas-dev might be more awkward to fix.

@jorisvandenbossche
Copy link
Member

Also, for testing the scripts for different Python / OS versions: I suppose those tests don't take very long? In that case it might be fine to actually include them in this repo as well. We could have a github actions workflow that eg only get triggered if something changes in the scripts directory?

@MarcoGorelli
Copy link
Member Author

The separate repo has a version, and that version gets bumped from time to time in pandas' CI?

Exactly, it'd get bumped by the (existing) autoupdate job

I suppose those tests don't take very long?

Less than a minute, probably. For now I'll make the plugin in my own repo, and if people like it we can decide on where to keep it / what to name it.

Thanks for the comments, will ping next week when I have something ready

@bashtage
Copy link
Contributor

bashtage commented Apr 8, 2021

I really like the pandas tools and it would be great if they had some ability to be easily used with other projects.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 8, 2021

Practical question: how do you see the syncing between both repo's (as right now I suppose often a change in the scripts and the related change in the pandas code itself happen in the same PR)? The separate repo has a version, and that version gets bumped from time to time in pandas' CI?

We might want to look into making the pandas-style-guide a git submodule of the main pandas repo. That would allow the pandas-style-guide to be used in other projects that could include it as a submodule.

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Apr 12, 2021

I really like the pandas tools and it would be great if they had some ability to be easily used with other projects.

This'd be an argument for keeping it separate - if there's any check in the that's not relevant outside of pandas development, then a great thing about flake8 is that it's easy to configure, so people can disable any rules they don't want

I've opened #40906 to show how this would work, and the repo can be viewed here: https://github.com/MarcoGorelli/pandas-dev-flaker

Thanks for the comments so far!

UPDATE

moved to https://github.com/pandas-dev/pandas-dev-flaker

@jreback jreback added this to the 1.3 milestone Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Style Code style, linting, code_checks Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants