Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEP Remove pytorch from environment.yml #49798

Merged
merged 13 commits into from
Nov 24, 2022

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli
Copy link
Member Author

@mroeschke looks like you'd added pytorch in: https://github.com/pandas-dev/pandas/pull/47287/files

Would you be OK with this change?

@@ -69,7 +69,6 @@ dependencies:
- pandas-datareader
- pyyaml
- py
- pytorch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you comment it out with a note?

@MarcoGorelli MarcoGorelli marked this pull request as ready for review November 20, 2022 10:38
@@ -114,6 +114,9 @@ jobs:
with:
fetch-depth: 0

- name: Uncomment pytorch dependency
run: sed -i 's/# - pytorch/- pytorch/g' environment.yml
Copy link
Member

@lithomas1 lithomas1 Nov 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed. asv pulls its dependencies into a separate environment based off of asv.conf.json.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't it does for this job because there's

asv run --quick --dry-run --strict --durations=30 --python=same

In the CI run of the commit which doesn't have this, the torch ASV check is skipped: https://github.com/pandas-dev/pandas/actions/runs/3507106640/jobs/5874585410

2022-11-20T08:22:57.2883873Z [  7.85%] ··· frame_ctor.From3rdParty.time_from_torch                        n/a

@MarcoGorelli MarcoGorelli added the Dependencies Required and optional dependencies label Nov 20, 2022
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the concern of pytorch being a heavy dependency, but I am also not the biggest fan of this patchy workaround.

In essence the root motivation of this seems related to having more targeted dependency files instead of an environmental.yml monolith. #48828

Unless there's an urgent need to remove this, I'd recommend working toward #48828 (e.g. a conda lock file for docs/required deps test/optional deps test/ etc) instead as it could also help out CI environments too

@MarcoGorelli
Copy link
Member Author

Thanks for taking a look

As far as I understand, a lock file would speed up the solving of the environment, but wouldn't affect its size

What I'm trying to tackle here is the size of the environment, which I think is bigger than that of practically any other open source project

If you're -1 on commenting and uncomment the pytorch dependency, would you be OK with having a separate environment for for asv checks (which would contain pytorch) and removing pytorch from environment.yml?

@MarcoGorelli MarcoGorelli force-pushed the no-torch branch 2 times, most recently from f77d4da to 95740a2 Compare November 21, 2022 08:53
@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Nov 21, 2022

I've tried this again, making a separate environment for the asv check

Like this:

  • the environment for ASV checks can be a lot smaller;
  • it avoids the comment / uncomment hack;
  • the environment.yml file which contributors use could be 2GB lighter, and this makes a real difference if you don't have a fast internet connection

Unless there's an urgent need to remove this

Not urgent, but the PyDataGlobal sprints are coming soon (early December), and I'd be pretty keen to remove 2GB of dependencies which are irrelevant to 99.9% of contributions (probably not an exaggeration) by then

@lithomas1
Copy link
Member

Thanks for taking a look

As far as I understand, a lock file would speed up the solving of the environment, but wouldn't affect its size

What I'm trying to tackle here is the size of the environment, which I think is bigger than that of practically any other open source project

If you're -1 on commenting and uncomment the pytorch dependency, would you be OK with having a separate environment for for asv checks (which would contain pytorch) and removing pytorch from environment.yml?

If python=same is the problem, can we just remove that part of the asv command, so that asv can create its own environment? Otherwise, we wouldn't have coverage for that in the CI(I've definitely forgot to update Cython before in the asv.conf.json and CI didn't fail).

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Nov 21, 2022

Yeah that's a good point - I tried it https://github.com/pandas-dev/pandas/actions/runs/3512450800/jobs/5884163088 but got:

ram [7110656]: · Unknown branch main in configuration
Error: Process completed with exit code 1.

with no further info. Any ideas? Looking further into this, debugging here #49814

@MarcoGorelli MarcoGorelli marked this pull request as draft November 21, 2022 17:34
@mroeschke
Copy link
Member

If you're -1 on commenting and uncomment the pytorch dependency, would you be OK with having a separate environment for for asv checks (which would contain pytorch) and removing pytorch from environment.yml?

I would generally be OK with splitting up environment.yml for different "work topics" in pandas like:

  • code-checks
  • docs
  • web
  • benchmarks
  • tests-required
  • tests-optional
  • tests-downstream

then an environment can be assembled based on install 1 or more of the dependency files depending on what is worked on.

(Right, conda-lock doesn't necessarily simplify the env size, but it greatly reduces the solve speed which can also be a limiting factor)

@MarcoGorelli MarcoGorelli marked this pull request as ready for review November 21, 2022 18:28
@MarcoGorelli MarcoGorelli marked this pull request as draft November 21, 2022 19:40
@MarcoGorelli MarcoGorelli marked this pull request as ready for review November 21, 2022 23:11
@MarcoGorelli MarcoGorelli marked this pull request as draft November 21, 2022 23:17
@MarcoGorelli MarcoGorelli marked this pull request as ready for review November 22, 2022 10:59
@MarcoGorelli
Copy link
Member Author

If python=same is the problem, can we just remove that part of the asv command, so that asv can create its own environment? Otherwise, we wouldn't have coverage for that in the CI(I've definitely forgot to update Cython before in the asv.conf.json and CI didn't fail).

🥳 🎉 finally got it, just needed to remove the default channel (which is consistent with the environment.yml file, which only has the conda-forge channel), and add a minimum pin for matplotlib, else it would fail to resolve. But at least now there's coverage for asv creating its own environment in CI. I also had to add git fetch origin main:main

So, now, we have:

  • lighter environment.yml file
  • asv checks let asv create its own environment, so that's covered in CI too
  • none of the initially-suggested hacks

@mroeschke would you be OK with this solution? (if not, no worries, hopefully there's something else we could work towards)

@WillAyd
Copy link
Member

WillAyd commented Nov 23, 2022

I think this is OK. That's a pretty big dependency to have, and doesn't seem like it is even tested in our CI?

@mroeschke
Copy link
Member

mroeschke commented Nov 24, 2022

I'd be okay removing pytorch (potentially for now) by:

  1. Still commenting it out of the environment.yml file (@jreback 's suggestion) hopefully moving toward more targeted environment files.
  2. Just removing the pytorch usage in the ASV benchmark and test in test_downstream.py all together

My main (stickler of a) motivation is that when I did a lot of dependency cleanup a while back for the CI, local dependencies in environment.yml were misaligned with the CI dependency files. And in theory, one should be able to have an environment that runs everything the CI does (test & docs & checks) since its the ultimate gatekeeper

@MarcoGorelli
Copy link
Member Author

Not sure I follow, wouldn't

https://github.com/pandas-dev/pandas/blob/0168e27843efb96b56202f22fbe5dd720c346368/asv_bench/asv.conf.json

count as a more targeted environment file? Because then asv would create its own environment, and so starting with what's in environment.yml you'd still be able to run everything

I've gone with your suggestion anyway

@mroeschke
Copy link
Member

count more targeted environment file?

Not sure if I completely follow now. That file only sets up the environment for ASV IIUC. If pytorch was removed from environment.yml, test_downstream.py:test_torch_frame_construction (now removed) wouldn't run

environment.yml Outdated
@@ -69,7 +69,7 @@ dependencies:
- pandas-datareader
- pyyaml
- py
- pytorch
# - pytorch https://github.com/pandas-dev/pandas/pull/49798
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the pytorch test and ASV are removed, I think it's okay to remove this too

@MarcoGorelli
Copy link
Member Author

count more targeted environment file?

Not sure if I completely follow now. That file only sets up the environment for ASV IIUC. If pytorch was removed from environment.yml, test_downstream.py:test_torch_frame_construction (now removed) wouldn't run

my bad, I was only thinking about the ASV

@MarcoGorelli
Copy link
Member Author

thanks for your review / approval! merging then

@MarcoGorelli MarcoGorelli added this to the 2.0 milestone Nov 24, 2022
@MarcoGorelli MarcoGorelli merged commit 3b09765 into pandas-dev:main Nov 24, 2022
mliu08 pushed a commit to mliu08/pandas that referenced this pull request Nov 27, 2022
* no torch

* set matplotlib to minimum version

* take mpl pin out again

* use pip for matplotlib

* remove default channel

* no pin needed if no default channel?

* turns out we gotta pin

* 🔥 remove torch

* undo asv.conf.json change

* comment out pytorch

* remove commented-out torch

Co-authored-by: MarcoGorelli <>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dependencies Required and optional dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEP Remove pytorch from environment.yml
5 participants