Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Adding origin parameter in pd.to_datetime #11470

Conversation

sumitbinnani
Copy link

Fixes #11276
Fixes #11745

@sumitbinnani sumitbinnani changed the title Added unit julian to to_datetime ENH: Added unit julian to to_datetime Oct 29, 2015
@jreback
Copy link
Contributor

jreback commented Oct 29, 2015

tests!

@sumitbinnani
Copy link
Author

On it. Any boundary conditions that I must take into consideration?

@jreback
Copy link
Contributor

jreback commented Oct 29, 2015

what do you mean?

@jreback
Copy link
Contributor

jreback commented Oct 29, 2015

also this would need to be added in Timestamp as that accepts a unit as well

@jreback jreback added Enhancement Datetime Datetime data dtype labels Oct 29, 2015
@jreback jreback changed the title ENH: Added unit julian to to_datetime ENH: Added julian unit in pd.to_datetime Oct 29, 2015
@sumitbinnani
Copy link
Author

Apologies for the late reply. I am not able to understand which file to edit for Timestamp. Kindly guide, @jreback?

@jreback
Copy link
Contributor

jreback commented Nov 2, 2015

@sumitbinnani
Copy link
Author

Thanks @jreback. Will go through the file asap :-)

@sumitbinnani
Copy link
Author

Hi, I am bit confused regarding the implementation. As far as I could understand, following might help. Kindly correct me if I am wrong.

## Constructor in Timestamp Class.
def __new__(cls, object ts_input, object offset=None, tz=None, unit=None):
        cdef _TSObject ts
        cdef _Timestamp ts_base

        if unit == 'julian':
            unit = 'D'
            ts_input = ts_input - Timestamp(0).to_julian_date()

        ts = convert_to_tsobject(ts_input, tz, unit)

        if ts.value == NPY_NAT:
            return NaT

        if util.is_string_object(offset):
            from pandas.tseries.frequencies import to_offset
            offset = to_offset(offset)

        # make datetime happy
        ts_base = _Timestamp.__new__(cls, ts.dts.year, ts.dts.month,
                                     ts.dts.day, ts.dts.hour, ts.dts.min,
                                     ts.dts.sec, ts.dts.us, ts.tzinfo)

        # fill out rest of data
        ts_base.value = ts.value
        ts_base.offset = offset
        ts_base.nanosecond = ts.dts.ps / 1000

        return ts_base

Does it seem okay, @jreback?

@jreback
Copy link
Contributor

jreback commented Nov 13, 2015

@sumitbinnani this should be implemented in cast_from_unit

@jreback
Copy link
Contributor

jreback commented Nov 25, 2015

this needs tests

@jreback
Copy link
Contributor

jreback commented Dec 2, 2015

see #11745; let's call this origin (and in this case I think it would be a string, e.g. 'julian', rather than a specific date)

@sumitbinnani
Copy link
Author

Have been a bit occupied for last a few weeks owing to applications and submission deadlines. Will make the necessary changes for #11745 asap. Facing a bit difficulty writing test cases (going through documentations and will mostly be done by this weekend).

@sumitbinnani
Copy link
Author

Have changed the definition of to_datetime to include a new parameter origin, and have added following lines in the code:

    ...
    if origin is not None:
        import datetime
        from pandas.core.api import Timestamp
        if origin == 'julian':
            arg = arg - Timestamp(0).to_julian_date()
        elif isinstance(origin, datetime.date):
            arg = arg + Timestamp(origin).to_julian_date() - Timestamp(0).to_julian_date()

   return ...

Should I remove the changes made to the class Timestamp as unit is no longer being used for the required functionality? @jreback?

@jreback
Copy link
Contributor

jreback commented Dec 9, 2015

@sumitbinnani yes, remove the original unit changes and go with origin as discussed

@sumitbinnani sumitbinnani force-pushed the ENH-Adding-unit-``julian``-to-``to_datetime`` branch from 86681d2 to 45037a8 Compare December 9, 2015 19:11
@sumitbinnani sumitbinnani changed the title ENH: Added julian unit in pd.to_datetime ENH: Adding origin parameter in pd.to_datetime Dec 9, 2015
@sumitbinnani sumitbinnani force-pushed the ENH-Adding-unit-``julian``-to-``to_datetime`` branch 2 times, most recently from 1deddb0 to 86681d2 Compare December 9, 2015 20:53
@sumitbinnani
Copy link
Author

Hi @jreback,

Have made the changes as followed:

  • Added origin as a new parameter with default value of None to to_datetime
  • Updated documentation for to_datetime
  • Added test case for the patch (wrote test case for the first time and might require a bit of changes)

Let me know if any additional changes are required.

@@ -743,6 +743,22 @@ def test_to_datetime_unit(self):
result = to_datetime(s,unit='s')
expected = Series([ Timestamp('2013-06-09 02:42:28') + timedelta(seconds=t) for t in range(20) ] + [NaT])
assert_series_equal(result,expected)

def test_to_datetime_origin(self):
# for origin as julian
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment

@jreback
Copy link
Contributor

jreback commented Dec 10, 2015

pls add whatsnew note in Enhancements for 0.18.0

0 1960-01-01
1 1960-01-02
...
98 1960-04-08
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output here of the example does not yet match the example code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opps. My Bad. Updated the documentation with proper output.

Rectified mistake in documentation.
@sumitbinnani
Copy link
Author

Have rectified the mistake in example. Unsure about how to proceed for the aforementioned invalid-origin :-(

@jreback
Copy link
Contributor

jreback commented Aug 19, 2016

In [13]: pd.to_datetime([2000*365], unit='D', origin='0001-01-01')
...
ValueError: Invalid 'origin' or 'origin' Out of Bound

I think this is fine for now. We could make this happen, but the code then gets convoluted, but it is so rare as not to be important.

I would just assert that this example raises.

@@ -24,6 +24,16 @@ New features

.. _whatsnew_0190.dev_api:

to_datetime can be used with Offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_datetime has gained an origin kwarg.

don't call this Offset which is a very specific meaning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the doc-string this is a 'reference date', so I would use that here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2016

can you rebase / update?

@sumitbinnani sumitbinnani force-pushed the ENH-Adding-unit-``julian``-to-``to_datetime`` branch from 763a620 to ffd9b28 Compare October 4, 2016 04:55
@sumitbinnani sumitbinnani force-pushed the ENH-Adding-unit-``julian``-to-``to_datetime`` branch from a99fdb3 to 4396d81 Compare October 4, 2016 05:57
@jorisvandenbossche
Copy link
Member

@sumitbinnani Can you move the whatsnew notice to the v0.20.0.txt file? As 0.19.0 has been releases in the meantime (also, all the other changes in the 0.19.0.txt file should not be included in here)

@sumitbinnani
Copy link
Author

@jorisvandenbossche The v0.19.0 had been reverted to the master. Changes added to v0.20.0.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Oct 5, 2016

Some examples that now go wrong:

In [1]: pd.to_datetime("2005-01-01", origin="1960-01-01")
Out[1]: Timestamp('1995-01-01 00:00:00')

In [2]: pd.to_datetime(1, origin="julian", unit='D')
...
OutOfBoundsDatetime: cannot convert input with unit 'D'

In [3]: pd.to_datetime(2456658, origin='julian', unit='D')
Out[3]: Timestamp('2013-12-31 00:00:00')

In [4]: pd.to_datetime(2456658, origin='0001-01-01', unit='D')
...
OutOfBoundsDatetime: cannot convert input with unit 'D'

So in In [1], this should raise, as the origin should only be used when having numeric input. In [2] is correct I think, but we should add a test for this that it indeed raises an OutOfBoundsDatetime error. In [4] should work (In [3] is just to show this is a number that is not out of bounds).

EDIT: regarding In [4], we decided above that it is OK that this does not work for now (as it is difficult to parse the out of bounds origin), but, it should have a better error message, and you should add a test for it.

@jorisvandenbossche
Copy link
Member

@sumitbinnani what I think you could do to simplify the code (having all handling of origin in one place) is the following: start with processing the origin keyword (and only do this for numerical input), convert it to an offset (according the the unit), and adapt the numerical input with this offset before further parsing it.

So something like (pseudocode)

if origin != 'epoch':
    if origin == 'julian':
        ... (what you already have in this case)
        arg = arg - tslib.Timestamp(0).to_julian_date()
    else:
        ... (parse origin and convert to correct number according to unit)
        arg = arg - offset

... (original code to parse `arg` (what you now put in the 'intermediate_result' function)

A way to deal with the out of bounds origin would be to use np.datetime64(dateutil.parser.parse(origin)), but it's OK to leave that for now.

@jreback
Copy link
Contributor

jreback commented Nov 25, 2016

can you rebase / update according to comments

@jreback
Copy link
Contributor

jreback commented Dec 26, 2016

@sumitbinnani I know this has been open a long time. can you rebase and we'll see if can get it in?

pls change release note to 0.20.0

@sumitbinnani
Copy link
Author

@jreback Sure. Would go through the code this weekend and rebase :-)

@sumitbinnani
Copy link
Author

@jreback There are quite a many changes. I would start clean and update asap. Sorry for the delay.

@bashtage
Copy link
Contributor

@sumitbinnani Useful feature so would be good to get this in

return _convert_listlike(arg, box, format, name=arg.name)
elif is_list_like(arg):
return _convert_listlike(arg, box, format)
def intermediate_result(arg):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some reason this is a function? It looks like it should just be an if, else if, else

Define reference date. The numeric values would be parsed as number
of units (defined by `unit`) since this reference date.

- If 'epoch', origin is set to 1970-01-01.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could consider other standard offsets: excel, Jan 0 1900, stata Jan 0, 1960, matlab, Jan 0, 0000? Just an idea

@jreback
Copy link
Contributor

jreback commented Mar 28, 2017

superseded by #15828

@sumitbinnani I rebase and updated your PR. thanks!

@jreback jreback closed this Mar 28, 2017
@jreback jreback added this to the 0.20.0 milestone Mar 28, 2017
jreback added a commit that referenced this pull request Apr 2, 2017
closes #11276
closes #11745
superseded #11470

Author: Jeff Reback <jeff@reback.net>
Author: Sumit Binnani <sumit.binnani@gmail.com>

Closes #15828 from jreback/datetime-unit and squashes the following commits:

ebb4acd [Jeff Reback] doc fixes & cleanup
209591a [Jeff Reback] bug fix
56663a5 [Jeff Reback] add Timedelta floordiv ops
a24e88c [Jeff Reback] rename epoch -> unix
6a8a779 [Jeff Reback] update docs / tests
ad7356e [Sumit Binnani] BUG: Series creation with datetime64 with non-ns unit as object dtype
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants