Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: array_to_datetime catch overflows in one place #24049

Merged
merged 3 commits into from
Dec 2, 2018

Conversation

jbrockmendel
Copy link
Member

The diff is big, but all this is doing is changing:

if A:
    try:
        foo(...)
    except OutOfBoundsDatetime:
        handle(...)
elif B:
    try:
        bar(...)
    except OutOfBoundsDatetime:
        handle(...)
...

to:

try:
    if A:
        foo(...)
    elif B:
        bar(...)
    ...
except OutOfBoundsDatetime:
    handle(...)
...

Orthogonal to #24032, will need rebasing following #24031. All three of these should go before #24006.

@codecov
Copy link

codecov bot commented Dec 2, 2018

Codecov Report

Merging #24049 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24049      +/-   ##
==========================================
- Coverage   42.45%   42.45%   -0.01%     
==========================================
  Files         161      161              
  Lines       51562    51561       -1     
==========================================
- Hits        21892    21888       -4     
- Misses      29670    29673       +3
Flag Coverage Δ
#single 42.45% <ø> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimelike.py 50.43% <0%> (-1.36%) ⬇️
pandas/core/indexes/datetimes.py 52.18% <0%> (ø) ⬆️
pandas/core/indexes/timedeltas.py 39.82% <0%> (ø) ⬆️
pandas/core/arrays/datetimes.py 63.48% <0%> (ø) ⬆️
pandas/core/indexes/datetimelike.py 53.54% <0%> (+1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 26d7c52...1004c9e. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 2, 2018

Codecov Report

Merging #24049 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #24049   +/-   ##
=======================================
  Coverage   42.45%   42.45%           
=======================================
  Files         161      161           
  Lines       51562    51562           
=======================================
  Hits        21892    21892           
  Misses      29670    29670
Flag Coverage Δ
#single 42.45% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 26d7c52...1004c9e. Read the comment docs.

@jreback jreback added Datetime Datetime data dtype Clean labels Dec 2, 2018
@jreback
Copy link
Contributor

jreback commented Dec 2, 2018

so what i think you should really do, is instead of nesting this further, move all that you have inside the try/except (in order to catch OutOfBounds), and put it in another function, then have array_to_datetime call it. will be much easier on the eyes i think. (see line 712)

maybe something like

try:
    return array_to_datetime(...)
except OutOfBounds:
    return array_to_object(...)
else:
    raise TypeError

@jbrockmendel
Copy link
Member Author

and put it in another function

Works for me. This hypothetical function would look a lot like convert_to_tsobject. Hopefully we'll be able to iron out the handful of differences between these and de-duplicate a bunch of code.

else:
    raise TypeError

I'm not wild about using TypeError for flow-control here because it obscures cases when a TypeError is raised naturally. #24032 is an orthogonal refactor that I think improves the situation. Following that I'll take a look at implementing your suggestion here.

@jreback
Copy link
Contributor

jreback commented Dec 2, 2018

Works for me. This hypothetical function would look a lot like convert_to_tsobject. Hopefully we'll be able to iron out the handful of differences between these and de-duplicate a bunch of code.

yep

I'm not wild about using TypeError for flow-control here because it obscures cases when a TypeError is raised naturally. #24032 is an orthogonal refactor that I think improves the situation. Following that I'll take a look at implementing your suggestion here

oh, that's what the original does, didn't actually mean to change anything.

@jbrockmendel
Copy link
Member Author

oh, that's what the original does, didn't actually mean to change anything.

Yah, its 24032 that does change the behavior. I'm pretty sure the suggestion you made here will need tweaking after 24032, and I think that's a more important change, so should go first.

@jreback
Copy link
Contributor

jreback commented Dec 2, 2018

ok, #24032 merged. rebase this and will look.

@jbrockmendel
Copy link
Member Author

I'm putting this together now and think the suggestion will work, but there are a bunch of state variables (seen_datetime, seen_integer, seen_string, out_tzoffset_vals) to keep track of, so it is getting pretty pointer-y. We should consider whether this refactor is worthwhile in its current form without separating out a new function.

@jreback
Copy link
Contributor

jreback commented Dec 2, 2018

right it’s nice and clean but might involve some duplicated code

but i think reducing nesting is worth that it becomes easier to reason about

@jbrockmendel
Copy link
Member Author

So I've gotten a version working with the new function separated out, but it adds enough non-pythonic complication that I'd rather leave the status quo than push it. If there's consensus that this PR as-is is an improvement on master then let's go for that, otherwise I'll close. #23675 and #23885 are much higher priorities.

@jreback jreback added this to the 0.24.0 milestone Dec 2, 2018
@jreback
Copy link
Contributor

jreback commented Dec 2, 2018

ok this is a net improvment. that function is a bear. but if you have time / maybe open an issue to refactor even more.

@jreback jreback merged commit e4d494b into pandas-dev:master Dec 2, 2018
@jbrockmendel jbrockmendel deleted the a2d-pre6 branch December 2, 2018 21:39
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants