-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: preserve object-dtype index when accessing DataFrame column / PERF: improve perf of Series fastpath constructor #42950
BUG: preserve object-dtype index when accessing DataFrame column / PERF: improve perf of Series fastpath constructor #42950
Conversation
pandas/core/series.py
Outdated
# labels may exceeds datetime bounds, | ||
# or not be a DatetimeIndex | ||
pass | ||
if labels._is_all_dates: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently this is being relied upon in the pytables code though ..
And looking into it further, the change I made here of course changes behaviour, as it no longer try to infer object dtype as datetime in the fastpath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although that should probably be considered as a bug fix. With master:
In [1]: df = pd.DataFrame({'a': [1, 2, 3]}, index=pd.date_range("2012", periods=3).astype(object))
In [2]: df.index
Out[2]: Index([2012-01-01 00:00:00, 2012-01-02 00:00:00, 2012-01-03 00:00:00], dtype='object')
In [3]: df['a'].index
Out[3]: DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03'], dtype='datetime64[ns]', freq=None)
where the index of the column-as-Series becomes different ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we deprecated this, then un-deprecated bc we didn't have good warning handling/suppression, and were going to try to re-deprecate before long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a reference (issue/PR) for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is #36697, but that's for the non-fastpath way, I think? While here it's specifically the fastpath that I am changing (which is not used when a user creates a Series directly themselves).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we did that i dont remember any distinction being made between fastpath vs non-fastpath
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
Appears this PR has been dormant for a while closing to clear the queue. Feel free to reopen when you have time to work on this PR and address the comment. |
@jorisvandenbossche not averse to these PRs but can you merge master & address any comments esp on the older ones |
Current thought: the if |
I suppose you mean as alternative to passing First, the Series(..) fastpath still sets a name as well (which is not included in Looking at where
So from that, I don't think it is super easy to change |
BTW, I still need to add a test case for this (the output is on master, with this PR both will be object dtype):
And we need to decide if we see that as a bug fix. |
definitely looks like a bugfix, nice catch |
can you rebase i think we merged some parts of this elsewhere |
Any more specific pointer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM cc @jreback
@jorisvandenbossche can you rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls also rebase
doc/source/whatsnew/v1.4.0.rst
Outdated
@@ -869,6 +869,7 @@ Indexing | |||
- Bug in :meth:`DataFrame.loc.__setitem__` changing dtype when indexer was completely ``False`` (:issue:`37550`) | |||
- Bug in :meth:`IntervalIndex.get_indexer_non_unique` returning boolean mask instead of array of integers for a non unique and non monotonic index (:issue:`44084`) | |||
- Bug in :meth:`IntervalIndex.get_indexer_non_unique` not handling targets of ``dtype`` 'object' with NaNs correctly (:issue:`44482`) | |||
- Bug in getting a column from a DataFrame with an object-dtype row index with datetime-like values: the resulting Series now preserves the exact object-dtype Index from the parent DataFrame (:issue:`42950`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 1.5
…RF: improve perf of Series fastpath constructor (pandas-dev#42950)
This came up while discussing the CoW proof of concept (and need for item_cache / speed of creating a Series of a column). Those small changes gives a decent reduction of some overhead (~30% reduction):