Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to output UTC datetimes as "Z" in .isoformat() #90772

Open
pganssle opened this issue Feb 2, 2022 · 17 comments
Open

Add option to output UTC datetimes as "Z" in .isoformat() #90772

pganssle opened this issue Feb 2, 2022 · 17 comments
Assignees
Labels
extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@pganssle
Copy link
Member

pganssle commented Feb 2, 2022

BPO 46614
Nosy @brettcannon, @abalkin, @merwok, @pganssle, @godlygeek
PRs
  • bpo-46614: Allow datetime.isoformat to use "Z" UTC designator #32041
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pganssle'
    closed_at = None
    created_at = <Date 2022-02-02.17:31:46.548>
    labels = ['type-feature', 'library', '3.11']
    title = 'Add option to output UTC datetimes as "Z" in `.isoformat()`'
    updated_at = <Date 2022-04-04.01:30:42.788>
    user = 'https://github.com/pganssle'

    bugs.python.org fields:

    activity = <Date 2022-04-04.01:30:42.788>
    actor = 'godlygeek'
    assignee = 'p-ganssle'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2022-02-02.17:31:46.548>
    creator = 'p-ganssle'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 46614
    keywords = ['patch']
    message_count = 6.0
    messages = ['412384', '412876', '413102', '416622', '416638', '416648']
    nosy_count = 5.0
    nosy_names = ['brett.cannon', 'belopolsky', 'eric.araujo', 'p-ganssle', 'godlygeek']
    pr_nums = ['32041']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue46614'
    versions = ['Python 3.11']

    Linked PRs

    @pganssle
    Copy link
    Member Author

    pganssle commented Feb 2, 2022

    As part of bpo-35829, it was suggested that we add the ability to output the "Z" suffix in isoformat(), so that fromisoformat() can both be the exact functional inverse of isoformat() and parse datetimes with "Z" outputs. I think that that's not a particularly compelling motivation for this, but I also see plenty of examples of datetime.utcnow().isoformat() + "Z" out there, so it seems like this is a feature that we would want to have anyway, particularly if we want to deprecate and remove utcnow.

    I've spun this off into its own issue so that we can discuss how to implement the feature. The two obvious questions I see are:

    1. What do we call the option? use_utc_designator, allow_Z, utc_as_Z?
    2. What do we consider as "UTC"? Is it anything with +00:00? Just timezone.utc? Anything that seems like a fixed-offset zone with 0 offset?

    For example, do we want this?

    >>> LON = zoneinfo.ZoneInfo("Europe/London")
    >>> datetime(2022, 3, 1, tzinfo=LON).isoformat(utc_as_z=True)
    2022-03-01T00:00:00Z
    >>> datetime(2022, 6, 1, tzinfo=LON).isoformat(utc_as_z=True)
    2022-06-01T00:00:00+01:00

    Another possible definition might be if the tzinfo is a fixed-offset zone with offset 0:

    >>> datetime.timezone.utc.utcoffset(None)
    timedelta(0)
    >>> zoneinfo.ZoneInfo("UTC").utcoffset(None)
    timedelta(0)
    >>> dateutil.tz.UTC.utcoffset(None)
    timedelta(0)
    >>> pytz.UTC.utcoffset(None)
    timedelta(0)

    The only "odd man out" is dateutil.tz.tzfile objects representing fixed offsets, since all dateutil.tz.tzfile objects return None when utcoffset or dst are passed None. This can and will be changed in future versions.

    I feel like "If the offset is 00:00, use Z" is the wrong rule to use conceptually, but considering that people will be opting into this behavior, it is more likely that they will be surprised by datetime(2022, 3, 1, tzinfo=ZoneInfo("Europe/London").isoformat(utc_as_z=True) returning 2022-03-01T00:00:00+00:00 than alternation between Z and +00:00.

    Yet another option might be to add a completely separate function, utc_isoformat(*args, **kwargs), which is equivalent to (in the parlance of the other proposal) dt.astimezone(timezone.utc).isoformat(*args, **kwargs, utc_as_z=True). Basically, convert any datetime to UTC and append a Z to it. The biggest footgun there would be people using it on naïve datetimes and not realizing that it would interpret them as system local times.

    @pganssle pganssle added the 3.11 only security fixes label Feb 2, 2022
    @pganssle pganssle self-assigned this Feb 2, 2022
    @pganssle pganssle added stdlib Python modules in the Lib dir type-feature A feature request or enhancement 3.11 only security fixes labels Feb 2, 2022
    @pganssle pganssle self-assigned this Feb 2, 2022
    @pganssle pganssle added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Feb 2, 2022
    @merwok
    Copy link
    Member

    merwok commented Feb 8, 2022

    Would it be horrible to have the timezone instance control this?

    @godlygeek
    Copy link
    Mannequin

    godlygeek mannequin commented Feb 11, 2022

    I feel like "If the offset is 00:00, use Z" is the wrong rule to use conceptually

    This is a really good point that I hadn't considered: +00:00 and Z are semantically different, and just because a datetime has a UTC offset of 0 doesn't mean it should get a Z; Z is reserved specifically for UTC.

    It seems like the most semantically correct thing would be to only use Z if tzname() returns exactly "UTC". That would do the right thing for your London example for every major timezone library I'm aware of:

    >>> datetime.datetime.now(zoneinfo.ZoneInfo("Europe/London")).tzname()
    'GMT'
    >>> datetime.datetime.now(zoneinfo.ZoneInfo("UTC")).tzname()
    'UTC'
    >>> datetime.datetime.now(datetime.timezone.utc).tzname()
    'UTC'
    
    >>> datetime.datetime.now(dateutil.tz.gettz("Europe/London")).tzname()
    'GMT'
    >>> datetime.datetime.now(dateutil.tz.UTC).tzname()
    'UTC'
    
    >>> datetime.datetime.now(pytz.timezone("Europe/London")).tzname()
    'GMT'
    >>> datetime.datetime.now(pytz.UTC).tzname()
    'UTC'

    I think the right rule to use conceptually is "if use_utc_designator is true and the timezone name is 'UTC' then use Z". We could also check the offset, but I'm not convinced we need to.

    @pganssle
    Copy link
    Member Author

    pganssle commented Apr 3, 2022

    I think this approach is probably the best we can do, but I could also imagine that users might find it to be confusing behavior. I wonder if there's any informal user testing we can do?

    I guess the ISO 8601 spec does call "Z" the "UTC designator", so use_utc_designator seems like approximately the right name. My main hesitation with this name is that I suspect users may think that use_utc_designator means that they unconditionally want to use Z — without reading the documentation (which we can assume 99% of users won't do) — you might assume that dt.isoformat(use_utc_designator=True) would translate to dt.astimezone(timezone.utc).replace(tzinfo=None).isoformat() + "Z".

    A name like utc_as_z is definitely less... elegant, but conveys the concept a bit more clearly. Would be worth throwing it to a poll or something before merging.

    @merwok
    Copy link
    Member

    merwok commented Apr 3, 2022

    Bad idea: pass zulu=True

    It is short, memorable if you know about it, otherwise obscure enough to push people to read the docs and be clear about what it does.

    Also strange and far from obvious, so a bad idea. Unless… ?

    @godlygeek
    Copy link
    Mannequin

    godlygeek mannequin commented Apr 4, 2022

    My main hesitation with this name is that I suspect users may think that use_utc_designator means that they unconditionally want to use Z — without reading the documentation (which we can assume 99% of users won't do)

    I was thinking along similar lines when I used use_utc_designator in the PR, but I drew a different conclusion. I was thinking that the name use_utc_designator is sufficiently abstruse that no one would even be able to guess that it's referring to "Z" without actually reading the documentation for the parameter. In particular, I worry that zulu=True or allow_Z=True might lead people to make the mistake of thinking that they'll always get "Z" instead of "+00:00".

    A name like utc_as_z is definitely less... elegant, but conveys the concept a bit more clearly.

    This would definitely be more memorable and more approachable. If we stick with making it conditional on tzname() == "UTC", I definitely think we want to have "utc" in the name of the parameter, and utc_as_z satisfies that.

    utc_as_z seems reasonable to me. Let me know if you'd like me to update the PR.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @pganssle
    Copy link
    Member Author

    pganssle commented May 3, 2022

    After running a survey where I tried out 4 different keyword arguments at random with 972 participants (not all of them completed the survey all the way, admittedly) and asked people what they thought it does in various situations, I got the following results:

                      Percentage getting semantics question right by kwarg:
          allow_z       : naïve: 50.495 |  utc: 89.47  | nyc: 80.90 | lon: 42.35
      format_utc_as_z   : naïve: 41.304 |  utc: 88.68  | nyc: 54.88 | lon: 40.00
     use_utc_designator : naïve: 22.857 |  utc: 73.68  | nyc: 47.87 | lon: 39.29
          utc_as_z      : naïve: 38.542 |  utc: 93.27  | nyc: 57.30 | lon: 42.86
    

    This is strange because I think none of us like allow_z, including the survey participants. When told the actual semantics of the keyword argument, they overwhelmingly prefer utc_as_z:

    utc_as_z                     165
    format_utc_as_z               78
    I don't like any of these     42
    allow_z                       19
    use_utc_designator            12
    

    I was hoping the results here would be less... ambiguous.

    @pganssle
    Copy link
    Member Author

    pganssle commented May 3, 2022

    Given these results and the fact that people don't seem to have a great grasp on what this is supposed to do, let's push this feature to 3.12 to try to come up with a better design.

    @Saphyel
    Copy link

    Saphyel commented May 6, 2022

    I have to disagree about this approach:

    >>> LON = zoneinfo.ZoneInfo("Europe/London")
    >>> datetime(2022, 3, 1, tzinfo=LON).isoformat(utc_as_z=True)
    2022-03-01T00:00:00Z
    >>> datetime(2022, 6, 1, tzinfo=LON).isoformat(utc_as_z=True)
    2022-06-01T00:00:00+01:00
    

    Mainly because you may expect that your local TZ always return the 2nd format instead of the 1st one (the shortcut of +00:00)
    I think this should be only done when the timezone is not local but UTC only (or when missing timezone?)

    At the end what we want is to avoid this => https://i.redd.it/ocpk67fp6tx81.jpg

    @merwok
    Copy link
    Member

    merwok commented Aug 8, 2023

    After seeing usage in the PR, I find use_utc_designator=True a bit unwieldy, especially as it will nearly always be a keyword param.

    @joaoe
    Copy link

    joaoe commented Aug 8, 2023

    I suggest this is controlled in part by the tzinfo or time instance.

    For instance, in datetime.py this code in class time

    class time:
        # [...]
        def _tzstr(self):
            """Return formatted timezone offset (+xx:xx) or an empty string."""
            off = self.utcoffset()
            return _format_offset(off)
    

    can be extended as something like

    class time:
        # [...]
        def _tzstr(self):
            """Return formatted timezone offset (+xx:xx) or an empty string."""
            off = self.utcoffset()
            if not off and self.tzname() == "UTC":
                return "Z"
            return _format_offset(off)
    

    A respective change would be needed in class datetime.

    The string UTC is produced by

    class timezone(tzinfo):
        # ...
        @staticmethod
        def _name_from_offset(delta):
            if not delta:
                return 'UTC'
            # ...
    

    Now changing the default behavior of datetime formatting function to start outputting Z instead of 00:00 for UTC timestamps might cause some backward compatibility problems, so that utc_as_z=True argument might be needed.
    But I think the proposal utc_as_z=True is not too elegant nor useful.

    So looking at the declaration of isoformat

    class datetime:
        #....
        def isoformat(self, sep='T', timespec='auto'):
            #....
    class time:
        #....
        def isoformat(self, timespec='auto'):
            # ....
    

    Perhaps we could use a new tzspec argument to match the name of the timespec, with values 'auto', 'utcz', 'hours', 'minutes', 'seconds', etc. 'auto' would be the current behavior. 'utcz' would be the new behavior or using 'Z' instead of '00:00', and more values could be reserved to be fixed later ('hours', 'minutes', 'seconds') to control the behavior of the _format_offset() function.

    @mohd-akram
    Copy link

    It should just automatically do the right thing, i.e. use Z when the timezone is UTC. This is what happens in eg. JavaScript. There's no need for a new option.

    @FeldrinH
    Copy link

    FeldrinH commented May 11, 2024

    It should just automatically do the right thing, i.e. use Z when the timezone is UTC. This is what happens in eg. JavaScript. There's no need for a new option.

    In principle, I support this. In practice, I suspect this won't be done because changing the default behaviour of isoformat would break code that depends on the UTC offset always being formatted the way it currently is.

    @pganssle
    Copy link
    Member Author

    I object to the idea that using Z unconditionally is "the right thing", but also there are several logistical problems that make the idea of changing the default behavior a non-starter:

    1. Backwards compatibility — this alone will scuttle the proposal, because there are absolutely people relying on the fact that this outputs +00:00 instead of Z
    2. What it means to be "UTC" is ill-defined, as mentioned in the first post. Non-UTC datetimes might incidentaly have a +00:00 offset, and it isn't clear that it is appropriate to automatically give them Z, in which case we need to come up with a reliable heuristic for what it means to be "UTC". Making this behavior auto-magical will make it even harder for people to discover the edge cases.

    Whatever we do needs to be explicit, but it seems very hard to do that, since when we ran the survey people seemed to have conflicting understandings of what any of this stuff might do; there doesn't seem to be an unambiguous way to convey what the behavior might be.

    If this were a high priority and in high demand, I would suggest we might bite the bullet and have this be just one more thing that end users need to learn about datetime, but it seems like the main motivation for adding this option was to make it so that formats ending with Z would technically satisfy the pre-3.11 contract of fromisoformat (parsing any format that .isoformat can emit), which doesn't even apply anymore.

    @mohd-akram
    Copy link

    It might be worth adding a isoformatutc method in that case. It would hit two birds with one stone - convert the time zone to UTC and provide it with the Z suffix - which is useful for machine processing. This is the same as JS's toISOString.

    @cabo
    Copy link

    cabo commented Nov 10, 2024

    JFYI:
    The Z vs. +00:00 issue has been debated extensively during the update to RFC 3339 that allows adding further information to a timestamp.
    RFC3339 distinguishes the semantics of expressing a time zone offset from that of not expressing one.
    Unfortunately, it chose -00:00 for the latter, which is not allowed in ISO/IEC 8601.
    The fix we ultimately arrived at is:

    https://www.rfc-editor.org/rfc/rfc9557#name-updating-rfc-3339

    Note that almost all timestamps in reality do not carry the intention to express a time zone offset, so Z is now the normal way to write an RFC 3339 timestamp -- the spec now follows the established practice for instance in the JavaScript community.

    @StanFromIreland
    Copy link
    Contributor

    StanFromIreland commented Mar 22, 2025

    Sorry Paul but you have been assigned for a few years and no pr so I assume you have not been actively working on this. Continued the work from the old pr. Making it mandatory is a discussion for later as its a backwards compatibility nightmare.

    @encukou encukou added the extension-modules C modules in the Modules dir label Mar 24, 2025
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    Development

    No branches or pull requests

    10 participants