Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLD: pandas should generate PEP 440 compliant dev versions #9518

Closed
shoyer opened this issue Feb 19, 2015 · 30 comments · Fixed by #10370
Closed

BLD: pandas should generate PEP 440 compliant dev versions #9518

shoyer opened this issue Feb 19, 2015 · 30 comments · Fixed by #10370
Labels
Build Library building on various platforms CI Continuous Integration
Milestone

Comments

@shoyer
Copy link
Member

shoyer commented Feb 19, 2015

Doing an inplace install with recent versions of pip (e.g., with pip install -e .) currently produces this warning:
UserWarning: The version specified ('0.15.2-224-geadfd92') is an invalid version, this may not work as expected with newer versions of setuptools, pip, and PyPI. Please see PEP 440 for more details.

We should update setup.py to produce PEP 440 compliant versions such as '0.15.2+224.geadfd92' instead. For an example of how to convert between, see python-versioneer/python-versioneer#45 (comment)

@jreback jreback added the Build Library building on various platforms label Feb 19, 2015
@shoyer shoyer added this to the 0.16.1 milestone Feb 20, 2015
@shoyer
Copy link
Member Author

shoyer commented Feb 20, 2015

I'll make a PR for this, which we can merge immediately before or after the release of v0.16.0. That way, dev versions will remain meaningfully ordered.

@kay1793
Copy link

kay1793 commented Feb 20, 2015

if you actually read the PEP it makes it clear pandas itself is not allowed to use local version strings. just so you know.

@shoyer
Copy link
Member Author

shoyer commented Feb 20, 2015

@kay1793 I did in fact read the PEP. My main intent here was to preserve the useful identifying nature of existing development versions. Git hashes are an important part of that.

We don't make or tag regular development releases of pandas (aside from release candidates), so the .dev and .post tags are not suitable for us. Because we don't distribute any releases with these versions, I think it's perfectly suitable to use local version identifiers in this context. If a user installs a development version of pandas, I think it is reasonable to consider that a "downstream" version of the upstream, released versions of pandas -- and in fact, this default format ensures that such downstream versions are sorted properly.

This usage is also exactly what versioneer settled on, and I think they provide a compelling justification in the PR adding this: python-versioneer/python-versioneer#67

@kay1793
Copy link

kay1793 commented Feb 20, 2015

Because we don't distribute any releases with these versions, I think it's perfectly suitable to use
local version identifiers in this context

and the PEP says you're wrong.

If a user installs a development version of pandas, I think it is reasonable to consider that a
"downstream" version of the upstream, released versions of pandas

no.

We don't make or tag regular development releases of pandas (aside from release candidates),
so the .dev and .post tags are not suitable for us

"developmental release" is not clearly defined in the document, so .devN is probably what the PEP wants you to use. Otherwise, the PEP only concerns itself with official release versions and because pandas makes no official dev releases it can do what it likes during development and still be compliant (and users should use python setup.py).
In any case, choosing to treat every commit as if it were a developmental relase for versioning is far more reasonable then your claim that pandas is its own downstream (!) between releases.

Git hashes are an important part of that.

sure, but not for the setuptools version (all you need there is sorting). git hashes matter in the internal version pandas keeps in __version__, mainly for bug reports. that doesn't need to be pep440 compliant and is actually fine right now.

This usage is also exactly what versioneer settled on, and I think they provide a compelling
justification in the PR adding this: python-versioneer/python-versioneer#67

Do you mean:

So in one sense, we're abusing this feature to add information that doesn't fit in the 
other part (which PEP440 calls the "public version identifier"). But in another 
sense, tagged sources are "upstream", and your development tree (e.g. 
basically any source that isn't tagged) is a "downstream". 

?? that's not compelling at all. a project can't be its own downstream just like a person can't be his/her own mother.

@shoyer
Copy link
Member Author

shoyer commented Feb 20, 2015

Do you really think that pandas should use different version identifiers in setup.py and __version__? That seems like a terrible idea to me.

Otherwise, the PEP only concerns itself with official release versions, and since pandasd makes no dev release, it can do what it likes during development and still be compliant

Yes, this is my point exactly. This is not a literal interpretation of PEP 440 guidelines but a pragmatic choice for a gray area that it doesn't cover (unreleased development versions).

The only reasonable alternative I've seen is '0.15.2.post.dev224+geadfd92', but that version string has a lot more noise.

Can you think of any actual situations where such usage would be problematic?

@shoyer
Copy link
Member Author

shoyer commented Feb 20, 2015

You'll also notice:

Many build tools integrate with distributed version control systems like Git and Mercurial in order to add an identifying hash to the version identifier. As hashes cannot be ordered reliably such versions are not permitted in the public version field.

As with semantic versioning, the public .devN suffix may be used to uniquely identify such releases for publication, while the original DVCS based label can be stored in the project metadata.

Identifying hash information may also be included in local version labels. (emphasis mine)

https://www.python.org/dev/peps/pep-0440/#dvcs-based-version-labels

Pandas does not actually use a build system for unreleased development versions, so I'm not convinced it even makes sense to have .devN in the public version identifier, given that N here does not refer to any particular version (e.g., tracking master), but just however many commits have been made since the last tagged release (which may be squashed away or whatever).

@kay1793
Copy link

kay1793 commented Feb 20, 2015

Do you really think that pandas should use different version identifiers in setup.py
and version? That seems like a terrible idea to me.

do you have a reason you'd like to share for thinking that?

I was thinking 0.15.2.dev242 and 0.15.2-242-geadfd92 are practically interchangable.

Yes, this is my point exactly. <...> this a pragmatic choice for a gray area that it doesn't cover ...

But it does define what an upstream project is and says you should not use local version strings. Whether you violate that in order to handle some other undefined need is beside the point.

Can you think of any actual situations where such usage would be problematic?

that's not how standards are supposed to work. violating them because "I don't see a problem" is not how standards should be treated. can you think of a reason to abuse a PEP when you don't need to?

@kay1793
Copy link

kay1793 commented Feb 20, 2015

Identifying hash information may also be included in local version labels. (emphasis mine)

you're not allowed to use local version labels.

An "upstream project" is a project that defines its own public versions. A "downstream project" is one which tracks and redistributes an upstream project, potentially backporting security and bug fixes from later versions of the upstream project.

Local version identifiers SHOULD NOT be used for upstream projects.

Pandas does not actually use a build system for unreleased development versions

what does that have to do with anything?

 the public .devN suffix may be used to uniquely identify such releases

"uniquely identify" is what you care about, whether a "build system" is involved doesn't matter.

but just however many commits have been made since the last tagged release (which may
be squashed away or whatever).

so you're squshing public history in master now??

assuming you don't (you don't really) then the number of commits since the last tag is a unique (ordering) identifier of a commit and that's all you need for setuptools. The convenience of git tag can live in the (or in a) internal version string. setuptools doesn't need to know about it.

@shoyer
Copy link
Member Author

shoyer commented Feb 20, 2015

so you're squshing public history in master now??

Assuming you don't (you don't really) then the number of commits since last tag can be used to uniquely identify a release, with sorting.

You can install development versions of pandas from source from any branch on any source, not just master. So IMO the number does not suffice.

Also, note that 0.15.2.devN < 0.15.2 as far as setup tools is concerned -- hence the need for .post.devN

@kay1793
Copy link

kay1793 commented Feb 20, 2015

You can install development versions of pandas from source from any branch on any source,
not just master. So IMO the number does not suffice.

I don't understand. your goal is to control version sorting wrt to other commits in other forks and branches now? ok, well good luck with that.

Also, note that 0.15.2.devN < 0.15.2 as far as setup tools is concerned -- hence the need for .post.devN

Yes I did miss that and you actually need .post0.devN in that case I think.

Still waiting to hear why you think having pd.__version__ append "-ga1b3cdf" (or something) to the .devN version string in setup.py is such a "terrible idea". unless you actually meant that "all branches all sources" bit that is.

@ncoghlan
Copy link

I think using local versions for this is fine - the PEP 440 restriction is intended to be against uploading them to PyPI. Generating them when building directly from a source checkout seems fine to me - in that case, you're not relying on version specifiers, you're giving the VCS URL explicitly.

However, you may still want to consider incrementing the version number and switching to "commits since the last tag" dev release numbering - once the previous release has been published, installing from source control is at least arguably a pre-release of the next version, rather than a post-release of the previous one.

@kay1793
Copy link

kay1793 commented Feb 20, 2015

@ncoghlan, The PEP (which you co-wrote... hi there) should say that then, because Local version identifiers SHOULD NOT be used for upstream projects. doesn't really. The PEP also strongly implies that local version strings are reserved for use by packagers and entities which distribute patched versions of an upstream package. The case of "version strings for installations from upstream git master" isn't covered which is why people are bending over backwards with silly arguments like "upstream is its own downstream" to work around what is written in there.

@tacaswell
Copy link
Contributor

I think part of what is being lost here is that if a user is building from source then they are their own packagers. Using the down-stream post-fix (with the git hash) to make it easy for the upstream devs to understand exactly what version some one is using when they report an issue is a first order good (because 'master' is a moving target).

At my day-job we package and distribute internally a number of python packages (at this point mostly our own stuff) and I am sure that we are not unique in this regard. Having the upstream projects set up to emit sensible local patch numbering also seems like a first order good (because I am lazy).

@ncoghlan As I understood the pep the post can take a number ex X.X.X.postN+gHASH. If you do this with out a version bump you can get all of the information you need out of git describe rather than having to introspect the python (which is nice for playing nice with tools like conda).

There is also existential discussion of what a 'release' is. One can make the case that every git commit that passes tests is a 'release', but this seems like the wrong place to start painting that bike shed.

@kay1793
Copy link

kay1793 commented Feb 20, 2015

make it easy for the upstream devs to understand exactly what version some one is using
... is a first order good

Yes, but no one is actually saying the git hash is not useful, only questioning what the PEP (as written) says about upstream using +foo to hold it.

if a user is building from source then they are their own packagers

I'm not so sure - if they're packaging a version that is simply an upstream commit with a version string that upstream decided on? what would be the difference between upstream shipping tarballs and users doing it in this case? it's a de-facto dev release by upstream.

I dislike the suggested .preN version string because semantic versioning means you're never sure what the next version will be and releasing 1.2.3.pre127 when you might never release 1.2.3 seems wrong.

@tacaswell
Copy link
Contributor

I'm not so sure - if they're packaging a version that is simply an upstream commit with a version string that upstream decided on? what would be the difference between upstream shipping tarballs and users doing it in this case? it's a de-facto dev release by upstream.

If a user checks out a commit and builds it (turning source -> binary) and then installs it they are doing exactly what a packager does. If upstream suggests a version string, already localized, to be something sensible, they are just being good OSS citizens 😈 .

If upstream did a full 'release' with the rigamarole of rebuilding the docs, announcements to the mailing list etc etc that included the +stuff that would be a problem but no one is suggesting that.

https://xkcd.com/386/

snapshot2

@ncoghlan
Copy link

If you're not touching PyPI (which is the case when installing directly from version control) then PEP 440's guidelines don't have to apply - we're specifically looking to control published version identifiers that need to interact with version specifiers, and hence need to define an unambiguous total ordering.

I've filed pypa/interoperability-peps#25 to propose clarifying this directly in the PEP, but in the meantime, if a local version identifier solves your problem, feel free to go that way. Don't tie yourselves in knots trying to stick to the letter of a PEP that doesn't currently cover your situation.

@jorisvandenbossche
Copy link
Member

@shoyer What is the exact format you propose?

We have now, depending on git being found or not, the formats v0.15.2-226-ge1aaf8c and v0.15.2.dev. Make this:

  • v0.15.2+226.ge1aaf8c (and v0.15.2+dev)
  • v0.15.2.post.dev226+ge1aaf8c
  • v0.16.dev226+ge1aaf8c

@tacaswell
Copy link
Contributor

I would advocate for v0.15.2.post226+ge1aaf8c as it does not require you to guess the next next release number, is more-or-less a valid version (modulo the extra +) and easier to parse than a pre-release of a post-release.

@jreback
Copy link
Contributor

jreback commented Feb 27, 2015

see here : https://github.com/ContinuumIO/blaze/blob/master/core.recipe/version.py

which generates tags like

0.15.2.post226.gelaaf8c

we use this now for conda recipes

@shoyer
Copy link
Member Author

shoyer commented Feb 27, 2015

My thought was to do v0.15.2+226.ge1aaf8c and v0.15.2+dev. That seemed like a little less noise than adding .post, which could be a bit confusing given that these releases never make it on pypi or other distribution channels.

On the other hand, maybe including .post is a good idea given that that makes it clear this is a version after rather than before v0.15.2. So I guess my vote is for @tacaswell's version: v0.15.2.post226+ge1aaf8c

@kay1793
Copy link

kay1793 commented Feb 28, 2015

I think @tacaswell's suggestion should work fine too, there's nothing confusing about adding .post and users installing these versions know where they came from anyway.

@jreback's suggstion is invalid according to the PEP so that doesn't help anything.

@tacaswell
Copy link
Contributor

@jreback Your link is dead, did you change your mind about using that method?

@jreback
Copy link
Contributor

jreback commented Mar 9, 2015

https://github.com/ContinuumIO/blaze/blob/master/core.recipe/meta.yaml

they r now doing it directly in the conda build (which is better)

@tacaswell
Copy link
Contributor

This makes it more difficult to feed back into the __version__ string of the module though.

@shoyer
Copy link
Member Author

shoyer commented Mar 9, 2015

@shoyer
Copy link
Member Author

shoyer commented Apr 15, 2015

I tried implemented this change locally. Unfortunately, something about the new version format broke with the old version of pip I was using:

$ pip install -e .
Obtaining file:///Users/shoyer/dev/pandas
  Running setup.py (path:/Users/shoyer/dev/pandas/setup.py) egg_info for package from file:///Users/shoyer/dev/pandas

    warning: no files found matching 'README.rst'
    warning: no directories found matching 'examples'
    warning: no previously-included files matching '*.so' found anywhere in distribution
    warning: no previously-included files matching '*.pyd' found anywhere in distribution
    warning: no previously-included files matching '*~' found anywhere in distribution
    warning: no previously-included files matching '#*' found anywhere in distribution
Cleaning up...
Exception:
Traceback (most recent call last):
  File "/Users/shoyer/miniconda/envs/pandas-dev/lib/python2.7/site-packages/pip/basecommand.py", line 122, in main
    status = self.run(options, args)
  File "/Users/shoyer/miniconda/envs/pandas-dev/lib/python2.7/site-packages/pip/commands/install.py", line 278, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
  File "/Users/shoyer/miniconda/envs/pandas-dev/lib/python2.7/site-packages/pip/req.py", line 1145, in prepare_files
    req_to_install.run_egg_info()
  File "/Users/shoyer/miniconda/envs/pandas-dev/lib/python2.7/site-packages/pip/req.py", line 330, in run_egg_info
    "%(Name)s==%(Version)s" % self.pkg_info())
  File "/Users/shoyer/miniconda/envs/pandas-dev/lib/python2.7/site-packages/pip/_vendor/pkg_resources.py", line 2667, in parse
    reqs = list(parse_requirements(s))
  File "/Users/shoyer/miniconda/envs/pandas-dev/lib/python2.7/site-packages/pip/_vendor/pkg_resources.py", line 2605, in parse_requirements
    line, p, specs = scan_list(VERSION,LINE_END,line,p,(1,2),"version spec")
  File "/Users/shoyer/miniconda/envs/pandas-dev/lib/python2.7/site-packages/pip/_vendor/pkg_resources.py", line 2583, in scan_list
    "Expected ',' or end-of-list in",line,"at",line[p:]
ValueError: ("Expected ',' or end-of-list in", u'pandas==0.16.0.post151+gcef3c85', 'at', u'+gcef3c85')

Then I upgraded to the newest pip and things worked fine:

$ pip install -e .
DEPRECATION: --download-cache has been deprecated and will be removed in the future. Pip now automatically uses and configures its cache.
Obtaining file:///Users/shoyer/dev/pandas
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /Users/shoyer/miniconda/envs/pandas-dev/lib/python2.7/site-packages (from pandas==0.16.0.post151+gcef3c85)
Requirement already satisfied (use --upgrade to upgrade): pytz>=2011k in /Users/shoyer/miniconda/envs/pandas-dev/lib/python2.7/site-packages (from pandas==0.16.0.post151+gcef3c85)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.7.0 in /Users/shoyer/miniconda/envs/pandas-dev/lib/python2.7/site-packages (from pandas==0.16.0.post151+gcef3c85)
Installing collected packages: pandas
  Running setup.py develop for pandas
Successfully installed pandas-0.16.0.post151+gcef3c85

But the version I initially tested (1.5.6) was only released last May. I suspect we'll get some grief if we break lots of dev installs. Not sure if there is really any ideal solution here....

@ncoghlan
Copy link

Just following up here to say pypa/interoperability-peps#25 has been closed via this tweak to PEP 440: https://hg.python.org/peps/rev/bf4ffb364faf

@kay1793
Copy link

kay1793 commented Apr 15, 2015

Thanks @ncoghlan.

@shoyer PEP440-compliant version strings make recent-ish pip versions croak?
are you sure you were using a release and not some interim git version?

@shoyer
Copy link
Member Author

shoyer commented Apr 16, 2015

I definitely was not using a development version of pip. I can do some more experiments -- it's possible that there was something funny with the setup of that virtual environment...

@kay1793
Copy link

kay1793 commented Apr 16, 2015

I think you got it right. PEP-386 preceded 440 and sanctioned the form:

N.N[.N]+[{a|b|c|rc}N[.N]+][.postN][.devN]

also included are specific regexps for parsing which include a terminating "$" and so you can't have anything following a .postN. That means PEP-440 local versions are not back-compatible with the tools which obey the previous version standard of python.

While checking this I've discovered pip phones home without asking whenever you run an install to check for updates. that's always a bonus.

@jreback jreback modified the milestones: Next Major Release, 0.16.1 Apr 28, 2015
@jreback jreback modified the milestones: 0.17.0, Next Major Release Jun 15, 2015
@jreback jreback added the CI Continuous Integration label Jun 15, 2015
jreback added a commit to jreback/pandas that referenced this issue Jun 17, 2015
jreback added a commit to jreback/pandas that referenced this issue Jul 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms CI Continuous Integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants