-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Timestamp.round errors #22802
Fix Timestamp.round errors #22802
Conversation
Hello @miccoli! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #22802 +/- ##
==========================================
+ Coverage 92.18% 92.19% +<.01%
==========================================
Files 169 169
Lines 50821 50827 +6
==========================================
+ Hits 46851 46860 +9
+ Misses 3970 3967 -3
Continue to review full report at Codecov.
|
pandas/_libs/tslibs/timestamps.pyx
Outdated
return _floor_int64(v + u//2, u) | ||
|
||
|
||
def round_nsint64(values, mode: RoundTo, freq): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think cython recognizes RoundTwo as a type unless it is a cdef class. Not entirely sure on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant this more as a comment than an actual directive. Fixed in 4490269
pandas/_libs/tslibs/timestamps.pyx
Outdated
d[mask] += 1 | ||
return d * unit | ||
|
||
raise NotImplementedError(mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reachable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is unreachable and the NotImplementedError
was cruft from a previous implementation.
In 4490269 I inserted an assert False
to avoid the code returning None
if a bug is introduced by changing the RoundTo
class.
pandas/pandas/_libs/tslibs/timestamps.pyx
Lines 134 to 135 in 5232948
# this should be unreachable | |
assert False, "should never arrive here" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fairly good:
- Nit: Could you rename the single variable names to something a little more descriptive (e.g.
d
todivisor
, etc) - Could you add similar
DatetimeIndex
tests as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a whatsnew note for 0.24.0
unit = to_offset(freq).nanos | ||
# test floor | ||
result = dt.floor(freq) | ||
assert result.value % unit == 0, "floor not a %s multiple" % (freq, ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a blank line between cases
assert result.value % unit == 0, "floor not a %s multiple" % (freq, ) | ||
assert 0 <= dt.value - result.value < unit, "floor error" | ||
# test ceil | ||
result = dt.ceil(freq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there round errors in timedelta ops as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, also timedeltas
are affected by rounding errors, but it is very unlikely that these will be found in real world applications.
BTW I found #22591 processing real scientific data, in which I had timestamps for 2018 measurements with about 1 µs resolution. On the contrary I cannot imagine of an application in which you have 50 years time deltas with such a small resolution.
Of course you could create a round-trip example:
Timedate -> Timedelta -> round() -> Timedate
which gives different results with respect to
Timedate -> round().
Nevertheless I would suggest to postpone the resolution of the timedelta issue after this PR is merged.
(Please note also that the current complex machinery, which tries to improve rounding errors, is implemented only for Timedate and not for Timedelta.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no i get it, that's why I asked. Ok to fix later, esp if its not very common.
|
demonstrates loss of precision and requires no UserWarning is raised for non standard frequencies
The old `round_ns` is replaced by `round_nsint64`; `round_nsint64` is based on integer arithmetic while `round_ns` was based on floating point numbers. Rounding mode is explicitly defined by RoundTo enum class: - RoundTo.MINUS_INFTY rounds to -∞ (floor) - RountTo.PLUS_INFTY rounds to +∞ (ceil) - RoundTo.NEAREST_HALF_MINUS_INFTY rounds to nearest multiple, and breaks tie to -∞ - RoundTo.NEAREST_HALF_PLUS_INFTY rounds to nearest multiple, and breaks tie to +∞ - RoundTo.NEAREST_HALF_EVEN rounds to nearest multiple, and breaks tie to even multiple
rebase done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. @mroeschke can comment as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Agreed that .format
is preferable over %
for the string error formatting.
the ci/circleci: py27_compat failure for 53c8c9b is a false negative, due to a networking problem during the build phase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny typo. ping on green.
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -646,6 +646,7 @@ Datetimelike | |||
- Bug in :class:`DatetimeIndex` subtraction that incorrectly failed to raise ``OverflowError`` (:issue:`22492`, :issue:`22508`) | |||
- Bug in :class:`DatetimeIndex` incorrectly allowing indexing with ``Timedelta`` object (:issue:`20464`) | |||
- Bug in :class:`DatetimeIndex` where frequency was being set if original frequency was ``None`` (:issue:`22150`) | |||
- Bug in rounding methods of :class:`DatetimeIndex` (:meth:`~DatetimeIndex.round`, :meth:`~DatetimeIndex.ceil`, :meth:`~DatetimeIndex.floor`) and :class:`Timestamp` (:meth:`~Timestamp.round`, :meth:`~Timestamp.ceil`, :meth:`~Timestamp.floor`) could give raise to loss of precision (:issue:`22591`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise -> rise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for catching this typo
pandas/_libs/tslibs/timestamps.pyx
Outdated
quotient[mask] += 1 | ||
return quotient * unit | ||
|
||
# this should be unreachable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could say undefined mode
hope everything is ready for merge (green light). |
lgtm. waiting till CI is un-borked to merge. |
thanks @miccoli |
thanks for merging |
pandas-dev#22802 imports `enum`, which was added to the built-ins in Python 3.4. The `enum34` package backports `enum` to earlier Python versions, so this should be a dependency as long as Python 2.7 is still supported.
* upstream/master: pct change bug issue 21200 (pandas-dev#21235) DOC: Fix summaries not ending with a period (pandas-dev#24190) DOC: Add back currentmodule to api.rst (pandas-dev#24232) DOC: Fixed implicit imports for whatsnew (v >= version 20.0) (pandas-dev#24199) remove enum import for PY2 compat, xref pandas-dev#22802 (pandas-dev#24170) CI: Simplify Azure Pipelines configuration files (pandas-dev#23111) REF/TST: Add more pytest idiom to indexing/multiindex/test_getitem.py (pandas-dev#24053)
git diff upstream/master -u -- "*.py" | flake8 --diff
ambiguous
kwarg, resolve conflicts.