Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Hub Sample images #50046

Closed
wants to merge 9 commits into from
Closed

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Dec 3, 2022

Still working on mamba. Opened an open source account inquiry with Docker so will eventually place there. In the interim can place on personal account for testing

@phofl
Copy link
Member

phofl commented Dec 3, 2022

Just to understand: This is for development with different environment sizes?

@WillAyd
Copy link
Member Author

WillAyd commented Dec 3, 2022

Yea it could help target different development goals. Alpine would be a super small image (~300 MB total), the minimal variants for pip/mamba just give you everything you need to compile / test most of the repository and come in at ~1 GB. The full variants would provide all packages for a complete environment, at the cost of more disk space (pip ~ 3 GB mamba ~ 6 GB)

I put the minimal images up at https://hub.docker.com/r/willayd/pandas-dev/tags if you'd like to try

@WillAyd
Copy link
Member Author

WillAyd commented Dec 3, 2022

Not a goal at the moment but there might also be a future where instead of rebuilding a testing image for every CI run we have a pre-built docker image that we just download and run from. Might save a lot of build time

@@ -0,0 +1,16 @@
FROM python:3.10.8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use slim?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice idea. Would have to do a bit more research though - the slim packages don't include compilers. Could get them with apt install build-essential but that adds 312 MB to the image size, so defeats the purpose a bit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this was an awesome idea. So if you just install gcc and g++ you get from the current 1.06 GB image size down to 475 MB

RUN apk add build-base

RUN python -m pip install \
cython \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maintain the minimal set with files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. One option is to create a requirements-dev-minimal.txt file in this CI folder, then can just execute a COPY command within the Dockerfile and install from that

@mroeschke
Copy link
Member

I see that the pip vs mamba minimals are pretty close in size while the mamba ones have a ~1G difference.

Was the thought of providing pip vs mamba due to a size difference? If the size difference isn't significant, I would slightly favor just having the mamba environments, especially this is for scientific python development in which mamba/conda is well more equipped than pip

@WillAyd
Copy link
Member Author

WillAyd commented Dec 6, 2022

Is that the size you see after installing locally? Might have some artifacts left over locally that I'm not aware of but the mamba full install for me was almost double the size of the pip image.

The mamba image is also a little more complicated to deal with if you want to run pre commit. Not sure if that would even work but the pip images should

@mroeschke
Copy link
Member

Oh that's just the compressed sizes I'm seeing in https://hub.docker.com/r/willayd/pandas-dev/tags

@WillAyd
Copy link
Member Author

WillAyd commented Dec 6, 2022

Ah gotcha. Yea would be curious if anyone does docker pull willayd/pandas-dev:mamba-all and docker pull willayd/pandas-dev:pip-all if you see a difference. Here's what mine yield

$ docker image list
REPOSITORY                      TAG                      IMAGE ID       CREATED         SIZE
willayd/pandas-dev              mamba-all                6336333277b7   2 hours ago     5.68GB
willayd/pandas-dev              pip-all                  b209d2a0b20b   2 days ago      2.63GB

There could be something simple missing from the mamba Dockerfile that would minimize that, but I think it will always use more disk space than pip

RUN apt update && apt upgrade -y
RUN DEBIAN_FRONTEND=noninteractive apt install -y tzdata

RUN mamba create -n pandas-dev \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is disconnected from requirements-minimal.txt at the moment. Could create a separate env file, break this up into two steps with a pip install, or leave as is

@WillAyd WillAyd marked this pull request as ready for review December 29, 2022 03:25
@@ -0,0 +1,8 @@
cython
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has some overlap with #50339

@WillAyd
Copy link
Member Author

WillAyd commented Dec 29, 2022

We now an official pandas org set up on docker hub - updated the documentation to point to that

@WillAyd WillAyd mentioned this pull request Dec 30, 2022
3 tasks
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@simonjayhawkins
Copy link
Member

@WillAyd still active? also a conflict needs resolving.

@mroeschke
Copy link
Member

Closing to clear the queue, but feel free to reopen when you have time to circle back

@mroeschke mroeschke closed this May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants