Skip to content

Commit fe29123

Browse files
TomAugspurgerjreback
authored andcommitted
DEPR: __array__ for tz-aware Series/Index (#24596)
1 parent 8506e66 commit fe29123

File tree

18 files changed

+329
-38
lines changed

18 files changed

+329
-38
lines changed

doc/source/api/series.rst

+3
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ Attributes
2626
.. autosummary::
2727
:toctree: generated/
2828

29+
Series.array
2930
Series.values
3031
Series.dtype
3132
Series.ftype
@@ -58,10 +59,12 @@ Conversion
5859
Series.convert_objects
5960
Series.copy
6061
Series.bool
62+
Series.to_numpy
6163
Series.to_period
6264
Series.to_timestamp
6365
Series.to_list
6466
Series.get_values
67+
Series.__array__
6568

6669
Indexing, iteration
6770
-------------------

doc/source/whatsnew/v0.24.0.rst

+69-1
Original file line numberDiff line numberDiff line change
@@ -1227,7 +1227,7 @@ Deprecations
12271227
.. _whatsnew_0240.deprecations.datetimelike_int_ops:
12281228

12291229
Integer Addition/Subtraction with Datetimes and Timedeltas is Deprecated
1230-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1230+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12311231

12321232
In the past, users could—in some cases—add or subtract integers or integer-dtype
12331233
arrays from :class:`Timestamp`, :class:`DatetimeIndex` and :class:`TimedeltaIndex`.
@@ -1265,6 +1265,74 @@ the object's ``freq`` attribute (:issue:`21939`, :issue:`23878`).
12651265
dti = pd.date_range('2001-01-01', periods=2, freq='7D')
12661266
dti + pd.Index([1 * dti.freq, 2 * dti.freq])
12671267
1268+
1269+
.. _whatsnew_0240.deprecations.tz_aware_array:
1270+
1271+
Converting Timezone-Aware Series and Index to NumPy Arrays
1272+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1273+
1274+
The conversion from a :class:`Series` or :class:`Index` with timezone-aware
1275+
datetime data will change to preserve timezones by default (:issue:`23569`).
1276+
1277+
NumPy doesn't have a dedicated dtype for timezone-aware datetimes.
1278+
In the past, converting a :class:`Series` or :class:`DatetimeIndex` with
1279+
timezone-aware datatimes would convert to a NumPy array by
1280+
1281+
1. converting the tz-aware data to UTC
1282+
2. dropping the timezone-info
1283+
3. returning a :class:`numpy.ndarray` with ``datetime64[ns]`` dtype
1284+
1285+
Future versions of pandas will preserve the timezone information by returning an
1286+
object-dtype NumPy array where each value is a :class:`Timestamp` with the correct
1287+
timezone attached
1288+
1289+
.. ipython:: python
1290+
1291+
ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
1292+
ser
1293+
1294+
The default behavior remains the same, but issues a warning
1295+
1296+
.. code-block:: python
1297+
1298+
In [8]: np.asarray(ser)
1299+
/bin/ipython:1: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive
1300+
ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray
1301+
with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'.
1302+
1303+
To accept the future behavior, pass 'dtype=object'.
1304+
To keep the old behavior, pass 'dtype="datetime64[ns]"'.
1305+
#!/bin/python3
1306+
Out[8]:
1307+
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
1308+
dtype='datetime64[ns]')
1309+
1310+
The previous or future behavior can be obtained, without any warnings, by specifying
1311+
the ``dtype``
1312+
1313+
*Previous Behavior*
1314+
1315+
.. ipython:: python
1316+
1317+
np.asarray(ser, dtype='datetime64[ns]')
1318+
1319+
*Future Behavior*
1320+
1321+
.. ipython:: python
1322+
1323+
# New behavior
1324+
np.asarray(ser, dtype=object)
1325+
1326+
1327+
Or by using :meth:`Series.to_numpy`
1328+
1329+
.. ipython:: python
1330+
1331+
ser.to_numpy()
1332+
ser.to_numpy(dtype="datetime64[ns]")
1333+
1334+
All the above applies to a :class:`DatetimeIndex` with tz-aware values as well.
1335+
12681336
.. _whatsnew_0240.prior_deprecations:
12691337

12701338
Removal of prior version deprecations/changes

pandas/core/arrays/datetimes.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -524,7 +524,7 @@ def _resolution(self):
524524
# Array-Like / EA-Interface Methods
525525

526526
def __array__(self, dtype=None):
527-
if is_object_dtype(dtype):
527+
if is_object_dtype(dtype) or (dtype is None and self.tz):
528528
return np.array(list(self), dtype=object)
529529
elif is_int64_dtype(dtype):
530530
return self.asi8

pandas/core/dtypes/cast.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1020,7 +1020,7 @@ def maybe_cast_to_datetime(value, dtype, errors='raise'):
10201020
# datetime64tz is assumed to be naive which should
10211021
# be localized to the timezone.
10221022
is_dt_string = is_string_dtype(value)
1023-
value = to_datetime(value, errors=errors)
1023+
value = to_datetime(value, errors=errors).array
10241024
if is_dt_string:
10251025
# Strings here are naive, so directly localize
10261026
value = value.tz_localize(dtype.tz)

pandas/core/dtypes/dtypes.py

+6
Original file line numberDiff line numberDiff line change
@@ -403,6 +403,7 @@ def _hash_categories(categories, ordered=True):
403403
from pandas.core.util.hashing import (
404404
hash_array, _combine_hash_arrays, hash_tuples
405405
)
406+
from pandas.core.dtypes.common import is_datetime64tz_dtype, _NS_DTYPE
406407

407408
if len(categories) and isinstance(categories[0], tuple):
408409
# assumes if any individual category is a tuple, then all our. ATM
@@ -420,6 +421,11 @@ def _hash_categories(categories, ordered=True):
420421
# find a better solution
421422
hashed = hash((tuple(categories), ordered))
422423
return hashed
424+
425+
if is_datetime64tz_dtype(categories.dtype):
426+
# Avoid future warning.
427+
categories = categories.astype(_NS_DTYPE)
428+
423429
cat_array = hash_array(np.asarray(categories), categorize=False)
424430
if ordered:
425431
cat_array = np.vstack([

pandas/core/groupby/groupby.py

+2-3
Original file line numberDiff line numberDiff line change
@@ -1271,8 +1271,8 @@ def f(self, **kwargs):
12711271
def first_compat(x, axis=0):
12721272

12731273
def first(x):
1274+
x = x.to_numpy()
12741275

1275-
x = np.asarray(x)
12761276
x = x[notna(x)]
12771277
if len(x) == 0:
12781278
return np.nan
@@ -1286,8 +1286,7 @@ def first(x):
12861286
def last_compat(x, axis=0):
12871287

12881288
def last(x):
1289-
1290-
x = np.asarray(x)
1289+
x = x.to_numpy()
12911290
x = x[notna(x)]
12921291
if len(x) == 0:
12931292
return np.nan

pandas/core/indexes/datetimes.py

+15-1
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,21 @@ def _simple_new(cls, values, name=None, freq=None, tz=None, dtype=None):
339339

340340
# --------------------------------------------------------------------
341341

342+
def __array__(self, dtype=None):
343+
if (dtype is None and isinstance(self._data, DatetimeArray)
344+
and getattr(self.dtype, 'tz', None)):
345+
msg = (
346+
"Converting timezone-aware DatetimeArray to timezone-naive "
347+
"ndarray with 'datetime64[ns]' dtype. In the future, this "
348+
"will return an ndarray with 'object' dtype where each "
349+
"element is a 'pandas.Timestamp' with the correct 'tz'.\n\t"
350+
"To accept the future behavior, pass 'dtype=object'.\n\t"
351+
"To keep the old behavior, pass 'dtype=\"datetime64[ns]\"'."
352+
)
353+
warnings.warn(msg, FutureWarning, stacklevel=3)
354+
dtype = 'M8[ns]'
355+
return np.asarray(self._data, dtype=dtype)
356+
342357
@property
343358
def dtype(self):
344359
return self._data.dtype
@@ -1114,7 +1129,6 @@ def slice_indexer(self, start=None, end=None, step=None, kind=None):
11141129

11151130
strftime = ea_passthrough(DatetimeArray.strftime)
11161131
_has_same_tz = ea_passthrough(DatetimeArray._has_same_tz)
1117-
__array__ = ea_passthrough(DatetimeArray.__array__)
11181132

11191133
@property
11201134
def offset(self):

pandas/core/indexing.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -581,7 +581,12 @@ def can_do_equal_len():
581581
setter(item, v)
582582

583583
# we have an equal len ndarray/convertible to our labels
584-
elif np.array(value).ndim == 2:
584+
# hasattr first, to avoid coercing to ndarray without reason.
585+
# But we may be relying on the ndarray coercion to check ndim.
586+
# Why not just convert to an ndarray earlier on if needed?
587+
elif ((hasattr(value, 'ndim') and value.ndim == 2)
588+
or (not hasattr(value, 'ndim') and
589+
np.array(value).ndim) == 2):
585590

586591
# note that this coerces the dtype if we are mixed
587592
# GH 7551

pandas/core/internals/blocks.py

+20-6
Original file line numberDiff line numberDiff line change
@@ -1447,8 +1447,20 @@ def quantile(self, qs, interpolation='linear', axis=0):
14471447
-------
14481448
Block
14491449
"""
1450-
values = self.get_values()
1451-
values, _ = self._try_coerce_args(values, values)
1450+
if self.is_datetimetz:
1451+
# TODO: cleanup this special case.
1452+
# We need to operate on i8 values for datetimetz
1453+
# but `Block.get_values()` returns an ndarray of objects
1454+
# right now. We need an API for "values to do numeric-like ops on"
1455+
values = self.values.asi8
1456+
1457+
# TODO: NonConsolidatableMixin shape
1458+
# Usual shape inconsistencies for ExtensionBlocks
1459+
if self.ndim > 1:
1460+
values = values[None, :]
1461+
else:
1462+
values = self.get_values()
1463+
values, _ = self._try_coerce_args(values, values)
14521464

14531465
is_empty = values.shape[axis] == 0
14541466
orig_scalar = not is_list_like(qs)
@@ -2055,10 +2067,6 @@ def _na_value(self):
20552067
def fill_value(self):
20562068
return tslibs.iNaT
20572069

2058-
def to_dense(self):
2059-
# TODO(DatetimeBlock): remove
2060-
return np.asarray(self.values)
2061-
20622070
def get_values(self, dtype=None):
20632071
"""
20642072
return object dtype as boxed values, such as Timestamps/Timedelta
@@ -2330,6 +2338,12 @@ def get_values(self, dtype=None):
23302338
values = values.reshape(1, -1)
23312339
return values
23322340

2341+
def to_dense(self):
2342+
# we request M8[ns] dtype here, even though it discards tzinfo,
2343+
# as lots of code (e.g. anything using values_from_object)
2344+
# expects that behavior.
2345+
return np.asarray(self.values, dtype=_NS_DTYPE)
2346+
23332347
def _slice(self, slicer):
23342348
""" return a slice of my values """
23352349
if isinstance(slicer, tuple):

pandas/core/internals/construction.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
from pandas.core.indexes import base as ibase
3535
from pandas.core.internals import (
3636
create_block_manager_from_arrays, create_block_manager_from_blocks)
37+
from pandas.core.internals.arrays import extract_array
3738

3839
# ---------------------------------------------------------------------
3940
# BlockManager Interface
@@ -539,7 +540,6 @@ def sanitize_array(data, index, dtype=None, copy=False,
539540
Sanitize input data to an ndarray, copy if specified, coerce to the
540541
dtype if specified.
541542
"""
542-
543543
if dtype is not None:
544544
dtype = pandas_dtype(dtype)
545545

@@ -552,8 +552,10 @@ def sanitize_array(data, index, dtype=None, copy=False,
552552
else:
553553
data = data.copy()
554554

555+
data = extract_array(data, extract_numpy=True)
556+
555557
# GH#846
556-
if isinstance(data, (np.ndarray, Index, ABCSeries)):
558+
if isinstance(data, np.ndarray):
557559

558560
if dtype is not None:
559561
subarr = np.array(data, copy=False)

pandas/core/nanops.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,9 @@ def f(values, axis=None, skipna=True, **kwds):
144144

145145
def _bn_ok_dtype(dt, name):
146146
# Bottleneck chokes on datetime64
147-
if (not is_object_dtype(dt) and not is_datetime_or_timedelta_dtype(dt)):
147+
if (not is_object_dtype(dt) and
148+
not (is_datetime_or_timedelta_dtype(dt) or
149+
is_datetime64tz_dtype(dt))):
148150

149151
# GH 15507
150152
# bottleneck does not properly upcast during the sum

pandas/core/reshape/tile.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from pandas._libs.lib import infer_dtype
99

1010
from pandas.core.dtypes.common import (
11-
ensure_int64, is_categorical_dtype, is_datetime64_dtype,
11+
_NS_DTYPE, ensure_int64, is_categorical_dtype, is_datetime64_dtype,
1212
is_datetime64tz_dtype, is_datetime_or_timedelta_dtype, is_integer,
1313
is_scalar, is_timedelta64_dtype)
1414
from pandas.core.dtypes.missing import isna
@@ -226,7 +226,10 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3,
226226
raise ValueError('Overlapping IntervalIndex is not accepted.')
227227

228228
else:
229-
bins = np.asarray(bins)
229+
if is_datetime64tz_dtype(bins):
230+
bins = np.asarray(bins, dtype=_NS_DTYPE)
231+
else:
232+
bins = np.asarray(bins)
230233
bins = _convert_bin_to_numeric_type(bins, dtype)
231234
if (np.diff(bins) < 0).any():
232235
raise ValueError('bins must increase monotonically.')

pandas/core/series.py

+61-5
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@
2121
is_extension_array_dtype, is_extension_type, is_hashable, is_integer,
2222
is_iterator, is_list_like, is_scalar, is_string_like, is_timedelta64_dtype)
2323
from pandas.core.dtypes.generic import (
24-
ABCDataFrame, ABCDatetimeIndex, ABCSeries, ABCSparseArray, ABCSparseSeries)
24+
ABCDataFrame, ABCDatetimeArray, ABCDatetimeIndex, ABCSeries,
25+
ABCSparseArray, ABCSparseSeries)
2526
from pandas.core.dtypes.missing import (
2627
isna, na_value_for_dtype, notna, remove_na_arraylike)
2728

@@ -658,11 +659,66 @@ def view(self, dtype=None):
658659
# ----------------------------------------------------------------------
659660
# NDArray Compat
660661

661-
def __array__(self, result=None):
662+
def __array__(self, dtype=None):
662663
"""
663-
The array interface, return my values.
664-
"""
665-
return self.get_values()
664+
Return the values as a NumPy array.
665+
666+
Users should not call this directly. Rather, it is invoked by
667+
:func:`numpy.array` and :func:`numpy.asarray`.
668+
669+
Parameters
670+
----------
671+
dtype : str or numpy.dtype, optional
672+
The dtype to use for the resulting NumPy array. By default,
673+
the dtype is inferred from the data.
674+
675+
Returns
676+
-------
677+
numpy.ndarray
678+
The values in the series converted to a :class:`numpy.ndarary`
679+
with the specified `dtype`.
680+
681+
See Also
682+
--------
683+
pandas.array : Create a new array from data.
684+
Series.array : Zero-copy view to the array backing the Series.
685+
Series.to_numpy : Series method for similar behavior.
686+
687+
Examples
688+
--------
689+
>>> ser = pd.Series([1, 2, 3])
690+
>>> np.asarray(ser)
691+
array([1, 2, 3])
692+
693+
For timezone-aware data, the timezones may be retained with
694+
``dtype='object'``
695+
696+
>>> tzser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
697+
>>> np.asarray(tzser, dtype="object")
698+
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET', freq='D'),
699+
Timestamp('2000-01-02 00:00:00+0100', tz='CET', freq='D')],
700+
dtype=object)
701+
702+
Or the values may be localized to UTC and the tzinfo discared with
703+
``dtype='datetime64[ns]'``
704+
705+
>>> np.asarray(tzser, dtype="datetime64[ns]") # doctest: +ELLIPSIS
706+
array(['1999-12-31T23:00:00.000000000', ...],
707+
dtype='datetime64[ns]')
708+
"""
709+
if (dtype is None and isinstance(self.array, ABCDatetimeArray)
710+
and getattr(self.dtype, 'tz', None)):
711+
msg = (
712+
"Converting timezone-aware DatetimeArray to timezone-naive "
713+
"ndarray with 'datetime64[ns]' dtype. In the future, this "
714+
"will return an ndarray with 'object' dtype where each "
715+
"element is a 'pandas.Timestamp' with the correct 'tz'.\n\t"
716+
"To accept the future behavior, pass 'dtype=object'.\n\t"
717+
"To keep the old behavior, pass 'dtype=\"datetime64[ns]\"'."
718+
)
719+
warnings.warn(msg, FutureWarning, stacklevel=3)
720+
dtype = 'M8[ns]'
721+
return np.asarray(self.array, dtype)
666722

667723
def __array_wrap__(self, result, context=None):
668724
"""

0 commit comments

Comments
 (0)