Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Timestamp.round errors #22802

Merged
merged 6 commits into from
Oct 1, 2018
Merged

Fix Timestamp.round errors #22802

merged 6 commits into from
Oct 1, 2018

Conversation

miccoli
Copy link
Contributor

@miccoli miccoli commented Sep 21, 2018

Sorry, something went wrong.

@pep8speaks
Copy link

Hello @miccoli! Thanks for submitting the PR.

@miccoli miccoli changed the title Gh#22591 Fix Timestamp.round errors Sep 21, 2018
@codecov
Copy link

codecov bot commented Sep 21, 2018

Codecov Report

Merging #22802 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22802      +/-   ##
==========================================
+ Coverage   92.18%   92.19%   +<.01%     
==========================================
  Files         169      169              
  Lines       50821    50827       +6     
==========================================
+ Hits        46851    46860       +9     
+ Misses       3970     3967       -3
Flag Coverage Δ
#multiple 90.61% <100%> (ø) ⬆️
#single 42.37% <33.33%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimelike.py 98.1% <100%> (ø) ⬆️
pandas/core/strings.py 98.63% <0%> (ø) ⬆️
pandas/core/indexes/multi.py 95.45% <0%> (ø) ⬆️
pandas/core/frame.py 97.2% <0%> (ø) ⬆️
pandas/core/generic.py 96.67% <0%> (ø) ⬆️
pandas/io/formats/style.py 96.43% <0%> (ø) ⬆️
pandas/core/series.py 93.74% <0%> (ø) ⬆️
pandas/core/nanops.py 95.14% <0%> (+0.01%) ⬆️
pandas/io/parsers.py 95.6% <0%> (+0.05%) ⬆️
pandas/core/internals/managers.py 96.66% <0%> (+0.1%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 302d43a...988e5e3. Read the comment docs.

return _floor_int64(v + u//2, u)


def round_nsint64(values, mode: RoundTo, freq):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think cython recognizes RoundTwo as a type unless it is a cdef class. Not entirely sure on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant this more as a comment than an actual directive. Fixed in 4490269

d[mask] += 1
return d * unit

raise NotImplementedError(mode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this reachable?

Copy link
Contributor Author

@miccoli miccoli Sep 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is unreachable and the NotImplementedError was cruft from a previous implementation.

In 4490269 I inserted an assert False to avoid the code returning None if a bug is introduced by changing the RoundTo class.

# this should be unreachable
assert False, "should never arrive here"

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fairly good:

  1. Nit: Could you rename the single variable names to something a little more descriptive (e.g. d to divisor, etc)
  2. Could you add similar DatetimeIndex tests as well?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a whatsnew note for 0.24.0

unit = to_offset(freq).nanos
# test floor
result = dt.floor(freq)
assert result.value % unit == 0, "floor not a %s multiple" % (freq, )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a blank line between cases

@jreback jreback added Datetime Datetime data dtype Numeric Operations Arithmetic, Comparison, and Logical operations Timezones Timezone data dtype labels Sep 23, 2018
assert result.value % unit == 0, "floor not a %s multiple" % (freq, )
assert 0 <= dt.value - result.value < unit, "floor error"
# test ceil
result = dt.ceil(freq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there round errors in timedelta ops as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, also timedeltas are affected by rounding errors, but it is very unlikely that these will be found in real world applications.

BTW I found #22591 processing real scientific data, in which I had timestamps for 2018 measurements with about 1 µs resolution. On the contrary I cannot imagine of an application in which you have 50 years time deltas with such a small resolution.

Of course you could create a round-trip example:

Timedate -> Timedelta -> round() -> Timedate 

which gives different results with respect to

Timedate -> round().

Nevertheless I would suggest to postpone the resolution of the timedelta issue after this PR is merged.

(Please note also that the current complex machinery, which tries to improve rounding errors, is implemented only for Timedate and not for Timedelta.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i get it, that's why I asked. Ok to fix later, esp if its not very common.

@jreback
Copy link
Contributor

jreback commented Sep 23, 2018

just merged #22647, @miccoli will need to rebase to accomadate the ambiguous kwarg

@miccoli
Copy link
Contributor Author

miccoli commented Sep 23, 2018

  • @mroeschke : I think I addressed your requests
    1. Nit: 4490269
    2. DatetimeIndex tests: 5232948
  • @jreback : sorry I had the above two commits ready in my pipeline, but I had not yet read your recent comments. I will address them later.

demonstrates loss of precision and requires no UserWarning is
raised for non standard frequencies
The old `round_ns` is replaced by `round_nsint64`;
`round_nsint64` is based on integer arithmetic
while `round_ns` was based on floating point numbers.

Rounding mode is explicitly defined by RoundTo enum class:

- RoundTo.MINUS_INFTY rounds to -∞ (floor)
- RountTo.PLUS_INFTY rounds to +∞ (ceil)
- RoundTo.NEAREST_HALF_MINUS_INFTY rounds to nearest multiple,
  and breaks tie to -∞
- RoundTo.NEAREST_HALF_PLUS_INFTY rounds to nearest multiple,
  and breaks tie to +∞
- RoundTo.NEAREST_HALF_EVEN rounds to nearest multiple,
  and breaks tie to even multiple
@miccoli
Copy link
Contributor Author

miccoli commented Sep 25, 2018

rebase done

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. @mroeschke can comment as well.

@jreback jreback added this to the 0.24.0 milestone Sep 25, 2018
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Agreed that .format is preferable over % for the string error formatting.

@miccoli
Copy link
Contributor Author

miccoli commented Sep 26, 2018

the ci/circleci: py27_compat failure for 53c8c9b is a false negative, due to a networking problem during the build phase.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny typo. ping on green.

@@ -646,6 +646,7 @@ Datetimelike
- Bug in :class:`DatetimeIndex` subtraction that incorrectly failed to raise ``OverflowError`` (:issue:`22492`, :issue:`22508`)
- Bug in :class:`DatetimeIndex` incorrectly allowing indexing with ``Timedelta`` object (:issue:`20464`)
- Bug in :class:`DatetimeIndex` where frequency was being set if original frequency was ``None`` (:issue:`22150`)
- Bug in rounding methods of :class:`DatetimeIndex` (:meth:`~DatetimeIndex.round`, :meth:`~DatetimeIndex.ceil`, :meth:`~DatetimeIndex.floor`) and :class:`Timestamp` (:meth:`~Timestamp.round`, :meth:`~Timestamp.ceil`, :meth:`~Timestamp.floor`) could give raise to loss of precision (:issue:`22591`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise -> rise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching this typo

quotient[mask] += 1
return quotient * unit

# this should be unreachable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could say undefined mode

@miccoli
Copy link
Contributor Author

miccoli commented Sep 26, 2018

hope everything is ready for merge (green light).

@jreback
Copy link
Contributor

jreback commented Sep 28, 2018

lgtm. waiting till CI is un-borked to merge.

@jreback jreback merged commit f021bbc into pandas-dev:master Oct 1, 2018
@jreback
Copy link
Contributor

jreback commented Oct 1, 2018

thanks @miccoli

@miccoli
Copy link
Contributor Author

miccoli commented Oct 1, 2018

thanks for merging

effigies added a commit to effigies/pandas that referenced this pull request Dec 7, 2018

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
pandas-dev#22802 imports `enum`, which was added to the built-ins in Python 3.4. The `enum34` package backports `enum` to earlier Python versions, so this should be a dependency as long as Python 2.7 is still supported.
jreback added a commit to jreback/pandas that referenced this pull request Dec 9, 2018
jreback added a commit to jreback/pandas that referenced this pull request Dec 11, 2018
jreback added a commit that referenced this pull request Dec 11, 2018

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
thoo added a commit to thoo/pandas that referenced this pull request Dec 12, 2018
* upstream/master:
  pct change bug issue 21200 (pandas-dev#21235)
  DOC: Fix summaries not ending with a period (pandas-dev#24190)
  DOC: Add back currentmodule to api.rst (pandas-dev#24232)
  DOC: Fixed implicit imports for whatsnew (v >= version 20.0) (pandas-dev#24199)
  remove enum import for PY2 compat, xref pandas-dev#22802 (pandas-dev#24170)
  CI: Simplify Azure Pipelines configuration files (pandas-dev#23111)
  REF/TST: Add more pytest idiom to indexing/multiindex/test_getitem.py (pandas-dev#24053)
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Numeric Operations Arithmetic, Comparison, and Logical operations Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent behaviour in Timestamp.round
5 participants