Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[time] document that integer nanoseconds may be more precise than float #83665

Open
vxgmichel mannequin opened this issue Jan 29, 2020 · 31 comments
Open

[time] document that integer nanoseconds may be more precise than float #83665

vxgmichel mannequin opened this issue Jan 29, 2020 · 31 comments
Labels
docs Documentation in the Doc dir extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@vxgmichel
Copy link
Mannequin

vxgmichel mannequin commented Jan 29, 2020

BPO 39484
Nosy @malemburg, @rhettinger, @mdickinson, @vstinner, @larryhastings, @serhiy-storchaka, @eryksun, @vxgmichel
Files
  • Comparing_division_errors_over_10_us.png
  • comparing_errors.py
  • Comparing_conversions_over_5_us.png
  • comparing_conversions.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-01-29.13:02:45.749>
    labels = ['3.8', '3.9', '3.10', 'extension-modules', 'type-feature', 'docs']
    title = '[time] document that integer nanoseconds may be more precise than float'
    updated_at = <Date 2021-03-21.21:08:52.484>
    user = 'https://github.com/vxgmichel'

    bugs.python.org fields:

    activity = <Date 2021-03-21.21:08:52.484>
    actor = 'eryksun'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation', 'Extension Modules']
    creation = <Date 2020-01-29.13:02:45.749>
    creator = 'vxgmichel'
    dependencies = []
    files = ['48880', '48881', '48882', '48883']
    hgrepos = []
    issue_num = 39484
    keywords = []
    message_count = 31.0
    messages = ['360958', '360980', '361139', '361140', '361141', '361142', '361143', '361144', '361286', '361288', '361292', '361296', '361300', '361301', '361304', '361306', '361308', '361309', '361314', '361316', '361319', '361351', '361650', '361651', '361652', '361653', '361654', '361665', '361666', '361668', '361669']
    nosy_count = 10.0
    nosy_names = ['lemburg', 'rhettinger', 'mark.dickinson', 'vstinner', 'larry', 'stutzbach', 'docs@python', 'serhiy.storchaka', 'eryksun', 'vxgmichel']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue39484'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @vxgmichel
    Copy link
    Mannequin Author

    vxgmichel mannequin commented Jan 29, 2020

    On windows, the timestamps produced by time.time() often end up being equal because of the 15 ms resolution:

        >>> time.time(), time.time()
        (1580301469.6875124, 1580301469.6875124)

    The problem I noticed is that a value produced by time_ns() might end up being higher then a value produced time() even though time_ns() was called before:

        >>> a, b = time.time_ns(), time.time()
        >>> a, b
        (1580301619906185300, 1580301619.9061852)
        >>> a / 10**9 <= b
        False

    This break in causality can lead to very obscure bugs since timestamps are often compared to one another. Note that those timestamps can also come from non-python sources, i.e a C program using GetSystemTimeAsFileTime.

    This problem seems to be related to the conversion _PyTime_AsSecondsDouble:

    cpython/Python/pytime.c

    Lines 460 to 461 in f1c1903

    d = (double)t;
    d /= 1e9;

        # Float produced by `time.time()`
        >>> b.hex()
        '0x1.78c5f4cf9fef0p+30'
        # Basically what `_PyTime_AsSecondsDouble` does:
        >>> (float(a) / 10**9).hex()
        '0x1.78c5f4cf9fef0p+30'
        # What I would expect from `time.time()`
        >>> (a / 10**9).hex()
        '0x1.78c5f4cf9fef1p+30'

    However I don't know if this would be enough to fix all causality issues since, as Tim Peters noted in another thread:

    Just noting for the record that a C double (time.time() result) isn't quite enough to hold a full-precision Windows time regardless

    (https://bugs.python.org/issue19738#msg204112)

    @vxgmichel vxgmichel mannequin added 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir 3.8 (EOL) end of life labels Jan 29, 2020
    @vxgmichel
    Copy link
    Mannequin Author

    vxgmichel mannequin commented Jan 29, 2020

    I thought about it a bit more and I realized there is no way to recover the time in hundreds of nanoseconds from the float produced by time.time() (since the windows time currently takes 54 bits and will take 55 bits in 2028).

    That means time() and time_ns() cannot be compared by converting time() to nanoseconds, but it might still make sense to compare them by converting time_ns() to seconds (which is apparently broken at the moment).

    If that makes sense, a possible roadmap to tackle this problem would be:

    • fix _PyTime_AsSecondsDouble so that time.time_ns() / 10**9 == time.time()
    • add a warning in the documentation that one should be careful when comparing the timestamps produced by time() and time_ns()(in particular,time()` should not be converted to nanoseconds)

    @vstinner
    Copy link
    Member

    vstinner commented Feb 1, 2020

    >>> a / 10**9 <= b
        False

    Try to use a/1e9 <= b.

    --

    The C code to get the system clock is the same for time.time() and time.time_ns(). It's only the conversion of the result which is different:

    static PyObject *
    time_time(PyObject *self, PyObject *unused)
    {
        _PyTime_t t = _PyTime_GetSystemClock();
        return _PyFloat_FromPyTime(t);
    }
    
    static PyObject *
    time_time_ns(PyObject *self, PyObject *unused)
    {
        _PyTime_t t = _PyTime_GetSystemClock();
        return _PyTime_AsNanosecondsObject(t);
    }

    where _PyTime_t is int64_t: 64-bit signed integer.

    Conversions:

    static PyObject*
    _PyFloat_FromPyTime(_PyTime_t t)
    {
        double d = _PyTime_AsSecondsDouble(t);
        return PyFloat_FromDouble(d);
    }
    
    
    double
    _PyTime_AsSecondsDouble(_PyTime_t t)
    {
        /* volatile avoids optimization changing how numbers are rounded */
        volatile double d;
    
        if (t % SEC_TO_NS == 0) {
            _PyTime_t secs;
            /* Divide using integers to avoid rounding issues on the integer part.
               1e-9 cannot be stored exactly in IEEE 64-bit. */
            secs = t / SEC_TO_NS;
            d = (double)secs;
        }
        else {
            d = (double)t;
            d /= 1e9;
        }
        return d;
    }
    
    PyObject *
    _PyTime_AsNanosecondsObject(_PyTime_t t)
    {
        Py_BUILD_ASSERT(sizeof(long long) >= sizeof(_PyTime_t));
        return PyLong_FromLongLong((long long)t);
    }

    In short, time.time() = float(time.time_ns()) / 1e9.

    --

    The problem can be reproduced in Python:

    >>> a=1580301619906185300
    >>> b=a/1e9
    >>> a / 10**9 <= b
    False

    I added time.time_ns() because we loose precision if you care about nanosecond resolution, with such "large number". float has a precision around 238 nanoseconds:

    >>> import math; ulp=math.ulp(b)
    >>> ulp
    2.384185791015625e-07
    >>> "%.0f +- %.0f" % (b*1e9, ulp*1e9)
    '1580301619906185216 +- 238'

    int/int and int/float don't give the same result:

    >>> a/10**9
    1580301619.9061854
    >>> a/1e9
    1580301619.9061852

    I'm not sure which one is "correct". To understand the issue, you can use the next math.nextafter() function to get the next floating point towards -inf:

    >>> a/10**9
    1580301619.9061854
    >>> a/1e9
    1580301619.9061852
    >>> math.nextafter(a/10**9, -math.inf)
    1580301619.9061852
    >>> math.nextafter(a/1e9, -math.inf)
    1580301619.906185

    Handling floating point numbers are hard.

    Why don't use only use integers? :-)

    @vstinner
    Copy link
    Member

    vstinner commented Feb 1, 2020

    See also bpo-39277 which is similar issue. I proposed a fix which uses nextafter():
    #17933

    @vstinner
    Copy link
    Member

    vstinner commented Feb 1, 2020

    Another way to understand the problem: nanosecond (int) => seconds (float) => nanoseconds (int) roundtrip looses precison.

    >>> a=1580301619906185300
    >>> a/1e9*1e9
    1.5803016199061852e+18
    >>> b=int(a/1e9*1e9)
    >>> b
    1580301619906185216
    >>> a - b
    84

    The best would be to add a round parameter to _PyTime_AsSecondsDouble(), but I'm not sure how to implement it.

    The following rounding mode is used to read a clock:

    /* Round towards minus infinity (-inf).
       For example, used to read a clock. */
    _PyTime_ROUND_FLOOR=0,
    

    _PyTime_ROUND_FLOOR is used in time.clock_settime(), time.gmtime(), time.localtime() and time.ctime() functions: to round input arguments.

    time.time(), time.monotonic() and time.perf_counter() converts _PyTime_t to float using _PyTime_AsSecondsDouble() (which currently has no round parameter) for their output.

    See also my rejected PEP-410 ;-)

    --

    One way to solve this issue is to document how to compare time.time() and time.time_ns() timestamps in a reliable way.

    @vstinner
    Copy link
    Member

    vstinner commented Feb 1, 2020

    By the way, I wrote an article about the history on how Python rounds time... https://vstinner.github.io/pytime.html

    I also wrote two articles about nanoseconds in Python:

    Oh and I forgot the main one: PEP-564 :-)

    "Example 2: compare times with different resolution" sounds like this issue.

    https://www.python.org/dev/peps/pep-0564/#example-2-compare-times-with-different-resolution

    @larryhastings
    Copy link
    Contributor

    I don't think this is fixable, because it's not exactly a bug. The problem is we're running out of bits. In converting the time around, we've lost some precision. So the times that come out of time.time() and time.time_ns() should not be considered directly comparable.

    Both functions, time.time() and time.time_ns(), call the same underlying function to get the current time. That function is_PyTime_GetSystemClock(); it returns nanoseconds since the 1970 epoch, stored in an int64. Each function then simply converts that time into its return format and returns that.

    In the case of time.time_ns(), it loses no precision whatsoever. In the case of time.time(), it (usually) converts to double and divides by 1e9, which is implicitly floor rounding.

    Back-of-the-envelope math here: An IEEE double has 53 bits of resolution for the mantissa, not counting the leading 1. The current time in seconds since the 1970 epoch uses about 29 bits of those 53 bits. That leaves 24 bits for the fractional second. But you'd need 30 bits to render all one billion fractional values. We're six bits short.

    Unless anybody has an amazing suggestion about how to ameliorate this situation, I think we should close this as wontfix.

    @larryhastings
    Copy link
    Contributor

    (Oh, wow, Victor, you wrote all that while I was writing my reply. ;-)

    @vxgmichel
    Copy link
    Mannequin Author

    vxgmichel mannequin commented Feb 3, 2020

    Thanks for your answers, that was very informative!

    >>> a/10**9
    1580301619.9061854
    >>> a/1e9
    1580301619.9061852

    I'm not sure which one is "correct".

    Originally, I thought a/10**9 was more precise because I ran into the following case while working with hundreds of nanoseconds (because windows):

    r = 1580301619906185900
    print("Ref   :", r)
    print("10**7 :", int(r // 100 / 10**7 * 10 ** 7) * 100)
    print("1e7   :", int(r // 100 / 1e7 * 10 ** 7) * 100)
    print("10**9 :", int(r / 10**9 * 10**9))
    print("1e9   :", int(r / 1e9 * 10**9))
    [...]
    Ref   : 1580301619906185900
    10**7 : 1580301619906185800
    1e7   : 1580301619906186200
    10**9 : 1580301619906185984
    1e9   : 1580301619906185984
    

    I decided to plot the conversion errors for different division methods over a short period of time. It turns out that:

    • /1e9 is equally or more precise than /10**9 when working with nanoseconds
    • /10**7 is equally or more precise than /1e7 when working with hundreds nanoseconds

    This result really surprised me, I have no idea what is the reason behind this.

    See the plots and code attached for more information.

    In any case, this means there is no reason to change the division in _PyTime_AsSecondsDouble, closing this issue as wontfix sounds fine :)

    ---

    As a side note, the only place I could find something similar mentioned in the docs is in the os.stat_result.st_ctime_ns documentation:

    https://docs.python.org/3.8/library/os.html#os.stat_result.st_ctime_ns

    Similarly, although st_atime_ns, st_mtime_ns, and st_ctime_ns are
    always expressed in nanoseconds, many systems do not provide
    nanosecond precision. On systems that do provide nanosecond precision,
    1the floating-point object used to store st_atime, st_mtime, and
    st_ctime cannot preserve all of it, and as such will be slightly
    inexact. If you need the exact timestamps you should always use
    st_atime_ns, st_mtime_ns, and st_ctime_ns.

    Maybe this kind of limitation should also be mentioned in the documentation of time.time_ns()?

    @vstinner
    Copy link
    Member

    vstinner commented Feb 3, 2020

    Yeah, time.time(), time.monotonic() and time.perf_counter() can benefit of a note suggestion to use time.time_ns(), time.monotonic_ns() or time.perf_counter_ns() to better precision.

    @serhiy-storchaka
    Copy link
    Member

    The problem is that there is a double rounding in

        time = float(time_ns) / 1e9
    1. When convert time_ns to float.
    2. When divide it by 1e9.

    The formula

        time = time_ns / 10**9

    may be more accurate.

    @vxgmichel
    Copy link
    Mannequin Author

    vxgmichel mannequin commented Feb 3, 2020

    The problem is that there is a double rounding in [...]

    Actually float(x) / 1e9 and x / 1e9 seems to produce the same results:

    import time
    import itertools
    now = time.time_ns()                                                                          
    for x in itertools.count(now):
        assert float(x) / 1e9 == x / 1e9
    

    The formula time = time_ns / 10**9 may be more accurate.

    Well that seems to not be the case, see the plots and the corresponding code. I might have made a mistake though, please let me know if I got something wrong :)

    @larryhastings
    Copy link
    Contributor

    The problem is that there is a double rounding in
    time = float(time_ns) / 1e9

    1. When convert time_ns to float.
    2. When divide it by 1e9.

    I'm pretty sure that in Python 3, if you say
    c = a / b
    and a and b are both "single-digit" integers, it first converts them both into doubles and then performs the divide. See long_true_divide() in Objects/longobject.c, starting (currently) at line 3938.

    @serhiy-storchaka
    Copy link
    Member

    But they are not single-digit integers. And more, int(float(a)) != a.

    @vstinner
    Copy link
    Member

    vstinner commented Feb 3, 2020

    I'm not sure which kind of problem you are trying to solve here. time.time() does lose precision because it uses the float type. Comparing time.time() and time.time_ns() tricky because of that. If you care of nanosecond precision, avoid float whenever possible and only store time as integer.

    I'm not sure how to compat time.time() float with time.time_ns(). Maybe math.isclose() can help.

    I don't think that Python is wrong here, time.time() and time.time_ns() work are expected, and I don't think that time.time() result can be magically more accurate: 1580301619906185300 nanoseconds (int) cannot be stored exactly as floating point number of seconds.

    I suggest to only document in time.time() is less accurate than time.time_ns().

    @serhiy-storchaka
    Copy link
    Member

    >>> 1580301619906185300/10**9
    1580301619.9061854
    >>> 1580301619906185300/1e9
    1580301619.9061852
    >>> float(F(1580301619906185300/10**9) * 10**9 - 1580301619906185300)
    88.5650634765625
    >>> float(F(1580301619906185300/1e9) * 10**9 - 1580301619906185300)
    -149.853515625

    1580301619906185300/10**9 is more accurate than 1580301619906185300/1e9.

    @vstinner
    Copy link
    Member

    vstinner commented Feb 3, 2020

    I compare nanoseconds (int):

    >> t=1580301619906185300

    # int/int: int.__truediv__(int)
    >>> abs(t - int(t/10**9 * 1e9))
    172
    
    # int/float: float.__rtruediv__(int)
    >>> abs(t - int(t/1e9 * 1e9))
    84
    
    # float/int: float.__truediv__(int)
    >>> abs(t - int(float(t)/10**9 * 1e9))
    84
    
    # float/float: float.__truediv__(float)
    >>> abs(t - int(float(t)/1e9 * 1e9))
    84

    => int/int is less accurate than float/float for t=1580301619906185300

    You compare seconds (float/Fraction):

    >> from fractions import Fraction as F
    >> t=1580301619906185300

    # int / int
    >>> float(F(t/10**9) * 10**9 - t)
    88.5650634765625
    
    # int / float
    >>> float(F(t/1e9) * 10**9 - t)
    -149.853515625

    => here int/int looks more accurate than int/float

    And we get different conclusion :-)

    @vxgmichel
    Copy link
    Mannequin Author

    vxgmichel mannequin commented Feb 3, 2020

    @serhiy.storchaka

    1580301619906185300/10**9 is more accurate than 1580301619906185300/1e9.

    I don't know exactly what F represents in your example but here is what I get:

    >>> r = 1580301619906185300                                                              
    >>> int(r / 10**9 * 10**9) - r                                                           
    172
    >>> int(r / 1e9 * 10**9) - r                                                             
    -84

    @vstinner

    I suggest to only document in time.time() is less accurate than time.time_ns().

    Sounds good!

    @mdickinson
    Copy link
    Member

    int/int is less accurate than float/float for t=1580301619906185300

    No, int/int is more accurate here. If a and b are ints, a / b is always correctly rounded on an IEEE 754 system, while float(a) / float(b) will not necessarily give a correctly rounded result. So for an integer a, a / 10**9 will always be at least as accurate as a / 1e9.

    @mdickinson
    Copy link
    Member

    To be clear: the following is flawed as an accuracy test, because the *multiplication* by 1e9 introduces additional error.

    # int/int: int.__truediv__(int)
    >>> abs(t - int(t/10**9 * 1e9))
    172

    Try this instead, which uses the Fractions module to get the exact error. (The error is converted to a float before printing, for convenience, to show the approximate size of the errors.)

    >>> from fractions import Fraction as F
    >>> exact = F(t, 10**9)
    >>> int_int = t / 10**9
    >>> float_float = t / 1e9
    >>> int_int_error = F(int_int) - exact
    >>> float_float_error = F(float_float) - exact
    >>> print(float(int_int_error))
    8.85650634765625e-08
    >>> print(float(float_float_error))
    -1.49853515625e-07

    @vxgmichel
    Copy link
    Mannequin Author

    vxgmichel mannequin commented Feb 3, 2020

    @mark.dickinson

    To be clear: the following is flawed as an accuracy test, because the *multiplication* by 1e9 introduces additional error.

    Interesting, I completely missed that!

    But did you notice that the full conversion might still perform better when using only floats?

    >>> from fractions import Fraction as F                                                   
    >>> r = 1580301619906185300                                                              
    >>> abs(int(r / 1e9 * 1e9) - r)                                                          
    84
    >>> abs(round(F(r / 10**9) * 10**9) - r)                                               
    89
    

    I wanted to figure out how often that happens so I updated my plotting, you can find the code and plot attached.

    Notice how both methods seems to perform equally good (the difference of the absolute errors seems to average to zero). I have no idea about why that happens though.

    @vstinner
    Copy link
    Member

    vstinner commented Feb 4, 2020

    No, int/int is more accurate here.

    Should _PyFloat_FromPyTime() implementation be modified to reuse long_true_divide()?

    @eryksun
    Copy link
    Contributor

    eryksun commented Feb 10, 2020

    A binary float has the form (-1)**sign * (1 + frac) * 2**exp, where sign is 0 or 1, frac is a rational value in the range [0, 1), and exp is a signed integer (but stored in non-negative, biased form). The smallest value of frac is epsilon, and the smallest increment for a given power of two is thus epsilon * 2**exp. To get exp for a given value, we have log2(abs(value)) == log2((1 + frac) * 2**exp) == log2(1 + frac) + log2(2**exp) == log2(1 + frac) + exp. Thus exp == log2(abs(value)) - log2(1 + frac). We know log2(1 + frac) is in the range [0, 1), so exp is the floor of the log2 result. For a binary64, epsilon is 2**-52, but we can leave it up to the floating point implementation by using sys.float_info:

        >>> exp = math.floor(math.log2(time.time()))
        >>> sys.float_info.epsilon * 2**exp
        2.384185791015625e-07

    Anyway, it's better to leave it to the experts:

        >>> t = time.time()
        >>> math.nextafter(t, math.inf) - t
        2.384185791015625e-07

    @larryhastings
    Copy link
    Contributor

    Anyway, it's better to leave it to the experts:

    I'm not sure what you're suggesting here. I shouldn't try to understand how floating-point numbers are stored?

    @eryksun
    Copy link
    Contributor

    eryksun commented Feb 10, 2020

    I'm not sure what you're suggesting here. I shouldn't try to understand
    how floating-point numbers are stored?

    No, that's the furthest thought from my mind. I meant only that I would not recommend using one's own understanding of floating-point numbers instead of something like math.nextafter. Even if I correctly understand the general case, there are probably corner cases that I'm not aware of.

    @eryksun eryksun added 3.9 only security fixes 3.10 only security fixes docs Documentation in the Doc dir extension-modules C modules in the Modules dir and removed 3.7 (EOL) end of life stdlib Python modules in the Lib dir labels Mar 21, 2021
    @eryksun eryksun changed the title time_ns() and time() cannot be compared on windows [time] document that integer nanoseconds may be more precise than float Mar 21, 2021
    @eryksun eryksun added type-feature A feature request or enhancement 3.9 only security fixes 3.10 only security fixes docs Documentation in the Doc dir extension-modules C modules in the Modules dir and removed type-bug An unexpected behavior, bug, or error 3.7 (EOL) end of life stdlib Python modules in the Lib dir labels Mar 21, 2021
    @eryksun eryksun changed the title time_ns() and time() cannot be compared on windows [time] document that integer nanoseconds may be more precise than float Mar 21, 2021
    @eryksun eryksun added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Mar 21, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @encukou encukou removed 3.10 only security fixes 3.9 only security fixes 3.8 (EOL) end of life labels Mar 17, 2025
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    Status: Todo
    Development

    No branches or pull requests

    6 participants