Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default repr for EAs #23601

Merged
merged 54 commits into from
Dec 4, 2018
Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
0fdbfd3
wip
TomAugspurger Nov 9, 2018
ace62aa
Deprecate formatting_values
TomAugspurger Nov 9, 2018
6e76b51
test for warning
TomAugspurger Nov 9, 2018
fef04e6
compat
TomAugspurger Nov 9, 2018
1885a97
na formatter
TomAugspurger Nov 9, 2018
ecfcd72
clean
TomAugspurger Nov 9, 2018
4e0d91f
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 9, 2018
37638cc
wip
TomAugspurger Nov 9, 2018
6e64b7b
more cleanup
TomAugspurger Nov 9, 2018
193747e
update docs, type
TomAugspurger Nov 9, 2018
5a2e1e4
format
TomAugspurger Nov 9, 2018
1635b73
try this
TomAugspurger Nov 9, 2018
e2b1941
updates
TomAugspurger Nov 9, 2018
48e55cc
fixup interval
TomAugspurger Nov 10, 2018
d8e7ba4
py2 compat
TomAugspurger Nov 10, 2018
b312fe4
revert interval
TomAugspurger Nov 10, 2018
445736d
unicode, bytes
TomAugspurger Nov 10, 2018
60e0d02
isort
TomAugspurger Nov 10, 2018
5b07906
py3 fixup
TomAugspurger Nov 10, 2018
ff0c998
fixup
TomAugspurger Nov 10, 2018
2fd3d5d
unicode
TomAugspurger Nov 10, 2018
5d8d2fc
unicode
TomAugspurger Nov 10, 2018
baee6b2
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 10, 2018
4d343ea
unicode
TomAugspurger Nov 10, 2018
5b291d5
lint
TomAugspurger Nov 10, 2018
1b93bf0
update repr tests
TomAugspurger Nov 11, 2018
708dd75
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 12, 2018
0f4083e
remove periodarray
TomAugspurger Nov 12, 2018
9116930
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 12, 2018
ebadf6f
FutureWarning -> DeprecationWarning
TomAugspurger Nov 12, 2018
e5f6976
wip
TomAugspurger Nov 12, 2018
221cee9
use repr
TomAugspurger Nov 12, 2018
439f2f8
fixup! use repr
TomAugspurger Nov 12, 2018
2364546
fixup! fixup! use repr
TomAugspurger Nov 12, 2018
62b1e2f
remove bytes
TomAugspurger Nov 12, 2018
a926dca
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 14, 2018
fc4279d
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 15, 2018
27db397
simplify formatter
TomAugspurger Nov 15, 2018
5c253a4
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 19, 2018
ef390fc
Updates: misc
TomAugspurger Nov 19, 2018
2b5fe25
BUG: Fixed SparseArray formatter
TomAugspurger Nov 19, 2018
d84cc02
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 20, 2018
d9df6bf
correct boxing
TomAugspurger Nov 20, 2018
a35399e
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 20, 2018
740f9e5
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 28, 2018
e7cc2ac
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 28, 2018
c79ba0b
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Dec 2, 2018
3825aeb
Use Array formatter in PeriodIndex
TomAugspurger Dec 2, 2018
2a60c15
Use repr / str
TomAugspurger Dec 2, 2018
bccf40d
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Dec 3, 2018
a7ef104
Update for review
TomAugspurger Dec 3, 2018
a3b1c92
REF: removed trailing_comma argument
TomAugspurger Dec 3, 2018
e080023
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Dec 3, 2018
6ad113b
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Dec 3, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1000,6 +1000,7 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your
- :meth:`DataFrame.stack` no longer converts to object dtype for DataFrames where each column has the same extension dtype. The output Series will have the same dtype as the columns (:issue:`23077`).
- :meth:`Series.unstack` and :meth:`DataFrame.unstack` no longer convert extension arrays to object-dtype ndarrays. Each column in the output ``DataFrame`` will now have the same dtype as the input (:issue:`23077`).
- Bug when grouping :meth:`Dataframe.groupby()` and aggregating on ``ExtensionArray`` it was not returning the actual ``ExtensionArray`` dtype (:issue:`23227`).
- A default repr for :class:`ExtensionArray` is now provided (:issue:`23601`).

.. _whatsnew_0240.api.incompatibilities:

Expand Down Expand Up @@ -1114,6 +1115,7 @@ Deprecations
- The methods :meth:`Series.str.partition` and :meth:`Series.str.rpartition` have deprecated the ``pat`` keyword in favor of ``sep`` (:issue:`22676`)
- Deprecated the `nthreads` keyword of :func:`pandas.read_feather` in favor of
`use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
- :meth:`ExtensionArray._formatting_values` is deprecated. Use `ExtensionArray._formatter` instead. (:issue:`23601`)
- :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`)
- Constructing a :class:`TimedeltaIndex` from data with ``datetime64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23539`)
- Constructing a :class:`DatetimeIndex` from data with ``timedelta64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23675`)
Expand Down Expand Up @@ -1279,6 +1281,7 @@ Datetimelike
- Bug in rounding methods of :class:`DatetimeIndex` (:meth:`~DatetimeIndex.round`, :meth:`~DatetimeIndex.ceil`, :meth:`~DatetimeIndex.floor`) and :class:`Timestamp` (:meth:`~Timestamp.round`, :meth:`~Timestamp.ceil`, :meth:`~Timestamp.floor`) could give rise to loss of precision (:issue:`22591`)
- Bug in :func:`to_datetime` with an :class:`Index` argument that would drop the ``name`` from the result (:issue:`21697`)
- Bug in :class:`PeriodIndex` where adding or subtracting a :class:`timedelta` or :class:`Tick` object produced incorrect results (:issue:`22988`)
- Bug in the :class:`Series` repr with period-dtype data missing a space before the data (:issue:`23601`)
- Bug in :func:`date_range` when decrementing a start date to a past end date by a negative frequency (:issue:`23270`)
- Bug in :meth:`Series.min` which would return ``NaN`` instead of ``NaT`` when called on a series of ``NaT`` (:issue:`23282`)
- Bug in :func:`DataFrame.combine` with datetimelike values raising a TypeError (:issue:`23079`)
Expand Down
66 changes: 61 additions & 5 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,12 @@ class ExtensionArray(object):
* copy
* _concat_same_type

An additional method is available to satisfy pandas' internal,
private block API.
A default repr displaying the type, (truncated) data, length,
and dtype is provided. It can be customized or replaced by
by overriding:

* _formatting_values
* __repr__ : A default repr for the ExtensionArray.
* _formatter : Print scalars inside a Series or DataFrame.

Some methods require casting the ExtensionArray to an ndarray of Python
objects with ``self.astype(object)``, which may be expensive. When
Expand Down Expand Up @@ -655,15 +657,69 @@ def copy(self, deep=False):
raise AbstractMethodError(self)

# ------------------------------------------------------------------------
# Block-related methods
# Printing
# ------------------------------------------------------------------------
def __repr__(self):
from pandas.io.formats.printing import format_object_summary

template = (
u'{class_name}'
u'{data}\n'
u'Length: {length}, dtype: {dtype}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercase Length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why lower? It's uppercase in the Series repr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its lowercase everywhere (else)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere else being index? Categorical uses a capital. I'd like to eventually use pieces of this for the categorical repr, and would rather not break that repr, so I think a capital makes more sense here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In index it is as part of the "keywords" in the constructor-resembling repr. There it indeed makes sense to have it lowercase. But here, I think capitalized is much more logical (it's the first item on that line), and consistent with Series and Categorical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I guess if we match the EA's to Series, except for the quoting (e.g. quote in EA, but no change in Series, meaning no quoting), then Index is separate. I am still concerned with these slight differences however.

E.g. even here, dtype is lowercase and Length is uppercase

Copy link
Contributor Author

@TomAugspurger TomAugspurger Nov 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that any discussion of quoting whether or not to quote is up to the individual array.

Are you specifically talking about quoting within PeriodArray here? Do we plan to quote within DatetimeArray and TimedeltaArray?

)
# the short repr has no trailing newline, while the truncated
# repr does. So we include a newline in our template, and strip
# any trailing newlines from format_object_summary
data = format_object_summary(self, self._formatter(), name=False,
trailing_comma=False).rstrip()
class_name = u'<{}>\n'.format(self.__class__.__name__)
return template.format(class_name=class_name, data=data,
length=len(self),
dtype=self.dtype)

def _formatter(self, boxed=False):
# type: (bool) -> Callable[[Any], Optional[str]]
"""Formatting function for scalar values.

This is used in the default '__repr__'. The returned formatting
function receives instances of your scalar type.

Parameters
----------
boxed: bool, default False
An indicated for whether or not your array is being printed
within a Series, DataFrame, or Index (True), or just by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't Index also use False like Array? (as I thought this was to satisfy the difference between Series/DataFrame vs Index)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index is hypothetical at this point, until we support EA-backed indexes. This isn't currently called by our index reprs. So we can choose to define it how we want. Or we can make this a box=None/Series/Index argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For external EAs, yes. But eg PeriodArray is stored in an Index, and there box is used for this purpose, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not. Done in 3825aeb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add this arg to the Index formatters as well for compatiblity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the need for adding it to the index formatters, since indexes are boxed by definition.

itself (False). This may be useful if you want scalar values
to appear differently within a Series versus on its own (e.g.
quoted or not).

Returns
-------
Callable[[Any], str]
A callable that gets instances of the scalar type and
returns a string. By default, :func:`repr` is used
when ``boxed=False`` and :func:`str` is used when
``boxed=True``.
"""
if boxed:
return str
return repr

def _formatting_values(self):
# type: () -> np.ndarray
# At the moment, this has to be an array since we use result.dtype
"""An array of values to be printed in, e.g. the Series repr"""
"""An array of values to be printed in, e.g. the Series repr

.. deprecated:: 0.24.0

Use :meth:`ExtensionArray._formatter` instead.
"""
return np.array(self)

# ------------------------------------------------------------------------
# Reshaping
# ------------------------------------------------------------------------

@classmethod
def _concat_same_type(cls, to_concat):
# type: (Sequence[ExtensionArray]) -> ExtensionArray
Expand Down
11 changes: 8 additions & 3 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -494,6 +494,10 @@ def _constructor(self):
def _from_sequence(cls, scalars, dtype=None, copy=False):
return Categorical(scalars, dtype=dtype)

def _formatter(self, boxed=False):
# backwards compat with old printing.
return None

def copy(self):
""" Copy constructor. """
return self._constructor(values=self._codes.copy(),
Expand Down Expand Up @@ -1987,6 +1991,10 @@ def __unicode__(self):

return result

def __repr__(self):
# We want PandasObject.__repr__, which dispatches to __unicode__
return super(ExtensionArray, self).__repr__()

def _maybe_coerce_indexer(self, indexer):
""" return an indexer coerced to the codes dtype """
if isinstance(indexer, np.ndarray) and indexer.dtype.kind == 'i':
Expand Down Expand Up @@ -2335,9 +2343,6 @@ def _concat_same_type(self, to_concat):

return _concat_categorical(to_concat)

def _formatting_values(self):
return self

def isin(self, values):
"""
Check whether `values` are contained in Categorical.
Expand Down
35 changes: 8 additions & 27 deletions pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import numpy as np

from pandas._libs import lib
from pandas.compat import range, set_function_name, string_types, u
from pandas.compat import range, set_function_name, string_types
from pandas.util._decorators import cache_readonly

from pandas.core.dtypes.base import ExtensionDtype
Expand All @@ -20,9 +20,6 @@
from pandas.core import nanops
from pandas.core.arrays import ExtensionArray, ExtensionOpsMixin

from pandas.io.formats.printing import (
default_pprint, format_object_attrs, format_object_summary)


class _IntegerDtype(ExtensionDtype):
"""
Expand Down Expand Up @@ -263,6 +260,13 @@ def _from_sequence(cls, scalars, dtype=None, copy=False):
def _from_factorized(cls, values, original):
return integer_array(values, dtype=original.dtype)

def _formatter(self, boxed=False):
def fmt(x):
if isna(x):
return 'NaN'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should NaN have been hardcoded here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? This is used for the Integer data display, where we currently use NaN.

(whether we should use rather 'NA' instead of 'NaN', that's another question)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering whether it would be a problem with to_string(na_rep=...). will do some tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that's a good reason. But in general, this _formatter does not follow display options at all, is that correct?
In which case this is something to think about in general.

return str(x)
return fmt

def __getitem__(self, item):
if is_integer(item):
if self._mask[item]:
Expand Down Expand Up @@ -296,10 +300,6 @@ def __iter__(self):
else:
yield self._data[i]

def _formatting_values(self):
# type: () -> np.ndarray
return self._coerce_to_ndarray()

def take(self, indexer, allow_fill=False, fill_value=None):
from pandas.api.extensions import take

Expand Down Expand Up @@ -349,25 +349,6 @@ def __setitem__(self, key, value):
def __len__(self):
return len(self._data)

def __repr__(self):
"""
Return a string representation for this object.

Invoked by unicode(df) in py2 only. Yields a Unicode String in both
py2/py3.
"""
klass = self.__class__.__name__
data = format_object_summary(self, default_pprint, False)
attrs = format_object_attrs(self)
space = " "

prepr = (u(",%s") %
space).join(u("%s=%s") % (k, v) for k, v in attrs)

res = u("%s(%s%s)") % (klass, data, prepr)

return res

@property
def nbytes(self):
return self._data.nbytes + self._mask.nbytes
Expand Down
3 changes: 0 additions & 3 deletions pandas/core/arrays/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -690,9 +690,6 @@ def copy(self, deep=False):
# TODO: Could skip verify_integrity here.
return type(self).from_arrays(left, right, closed=closed)

def _formatting_values(self):
return np.asarray(self)

def isna(self):
return isna(self.left)

Expand Down
11 changes: 4 additions & 7 deletions pandas/core/arrays/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -337,13 +337,10 @@ def to_timestamp(self, freq=None, how='start'):
# --------------------------------------------------------------------
# Array-like / EA-Interface Methods

def __repr__(self):
return '<{}>\n{}\nLength: {}, dtype: {}'.format(
self.__class__.__name__,
[str(s) for s in self],
len(self),
self.dtype
)
def _formatter(self, boxed=False):
if boxed:
return str
return "'{}'".format

def __setitem__(
self,
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/arrays/sparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -1735,6 +1735,9 @@ def __unicode__(self):
fill=printing.pprint_thing(self.fill_value),
index=printing.pprint_thing(self.sp_index))

def _formatter(self, boxed=False):
return None


SparseArray._add_arithmetic_ops()
SparseArray._add_comparison_ops()
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/indexes/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -529,7 +529,7 @@ def __array_wrap__(self, result, context=None):

@property
def _formatter_func(self):
return lambda x: "'%s'" % x
return self.array._formatter(boxed=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this the only index sublcass that you need to do this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the only extension-array backed index that's using the formatting right now.

  • CategoricalIndex always uses the default formatter for the underlying categories (since it can be a container for any type, it dispatches the formatting).
  • IntervalIndex /IntervalArray don't use this formatting

Datetime / TImedelta will use this too, so this will be pushed up into DatetimeIndexOpsMixin later.


def asof_locs(self, where, mask):
"""
Expand Down
16 changes: 14 additions & 2 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
_isna_compat, array_equivalent, is_null_datelike_scalar, isna, notna)

import pandas.core.algorithms as algos
from pandas.core.arrays import Categorical
from pandas.core.arrays import Categorical, ExtensionArray
from pandas.core.base import PandasObject
import pandas.core.common as com
from pandas.core.indexes.datetimes import DatetimeIndex
Expand Down Expand Up @@ -1915,7 +1915,19 @@ def _slice(self, slicer):
return self.values[slicer]

def formatting_values(self):
return self.values._formatting_values()
# Deprecating the ability to override _formatting_values.
# Do the warning here, it's only user in pandas, since we
# have to check if the subclass overrode it.
fv = getattr(type(self.values), '_formatting_values', None)
if fv and fv != ExtensionArray._formatting_values:
msg = (
"'ExtensionArray._formatting_values' is deprecated. "
"Specify 'ExtensionArray._formatter' instead."
)
warnings.warn(msg, DeprecationWarning, stacklevel=10)
return self.values._formatting_values()

return self.values

def concat_same_type(self, to_concat, placement=None):
"""
Expand Down
67 changes: 23 additions & 44 deletions pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,12 @@
from pandas.compat import StringIO, lzip, map, u, zip

from pandas.core.dtypes.common import (
is_categorical_dtype, is_datetime64_dtype, is_datetime64tz_dtype, is_float,
is_float_dtype, is_integer, is_integer_dtype, is_interval_dtype,
is_list_like, is_numeric_dtype, is_period_arraylike, is_scalar,
is_categorical_dtype, is_datetime64_dtype, is_datetime64tz_dtype,
is_extension_array_dtype, is_float, is_float_dtype, is_integer,
is_integer_dtype, is_list_like, is_numeric_dtype, is_scalar,
is_timedelta64_dtype)
from pandas.core.dtypes.generic import ABCMultiIndex, ABCSparseArray
from pandas.core.dtypes.generic import (
ABCIndexClass, ABCMultiIndex, ABCSeries, ABCSparseArray)
from pandas.core.dtypes.missing import isna, notna

from pandas import compat
Expand All @@ -29,7 +30,6 @@
from pandas.core.config import get_option, set_option
from pandas.core.index import Index, ensure_index
from pandas.core.indexes.datetimes import DatetimeIndex
from pandas.core.indexes.period import PeriodIndex

from pandas.io.common import _expand_user, _stringify_path
from pandas.io.formats.printing import adjoin, justify, pprint_thing
Expand Down Expand Up @@ -842,22 +842,18 @@ def _get_column_name_list(self):
def format_array(values, formatter, float_format=None, na_rep='NaN',
digits=None, space=None, justify='right', decimal='.'):

if is_categorical_dtype(values):
fmt_klass = CategoricalArrayFormatter
elif is_interval_dtype(values):
fmt_klass = IntervalArrayFormatter
if is_datetime64_dtype(values.dtype):
fmt_klass = Datetime64Formatter
elif is_timedelta64_dtype(values.dtype):
fmt_klass = Timedelta64Formatter
elif is_extension_array_dtype(values.dtype):
fmt_klass = ExtensionArrayFormatter
elif is_float_dtype(values.dtype):
fmt_klass = FloatArrayFormatter
elif is_period_arraylike(values):
fmt_klass = PeriodArrayFormatter
elif is_integer_dtype(values.dtype):
fmt_klass = IntArrayFormatter
elif is_datetime64tz_dtype(values):
fmt_klass = Datetime64TZFormatter
elif is_datetime64_dtype(values.dtype):
fmt_klass = Datetime64Formatter
elif is_timedelta64_dtype(values.dtype):
fmt_klass = Timedelta64Formatter
else:
fmt_klass = GenericArrayFormatter

Expand Down Expand Up @@ -1121,39 +1117,22 @@ def _format_strings(self):
return fmt_values.tolist()


class IntervalArrayFormatter(GenericArrayFormatter):

def __init__(self, values, *args, **kwargs):
GenericArrayFormatter.__init__(self, values, *args, **kwargs)

def _format_strings(self):
formatter = self.formatter or str
fmt_values = np.array([formatter(x) for x in self.values])
return fmt_values


class PeriodArrayFormatter(IntArrayFormatter):

class ExtensionArrayFormatter(GenericArrayFormatter):
def _format_strings(self):
from pandas.core.indexes.period import IncompatibleFrequency
try:
values = PeriodIndex(self.values).to_native_types()
except IncompatibleFrequency:
# periods may contains different freq
values = Index(self.values, dtype='object').to_native_types()

formatter = self.formatter or (lambda x: '{x}'.format(x=x))
fmt_values = [formatter(x) for x in values]
return fmt_values

values = self.values
if isinstance(values, (ABCIndexClass, ABCSeries)):
values = values._values

class CategoricalArrayFormatter(GenericArrayFormatter):
formatter = values._formatter(boxed=True)

def __init__(self, values, *args, **kwargs):
GenericArrayFormatter.__init__(self, values, *args, **kwargs)
if is_categorical_dtype(values.dtype):
# Categorical is special for now, so that we can preserve tzinfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a TODO here? this is until DatetimeArray is fully pushed?

Copy link
Contributor Author

@TomAugspurger TomAugspurger Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends on whether we're willing to change __array__ for datetime-backed series / index (right now . I'm writing up an issue now to discuss that specific point.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#23569 (comment) for that.

array = values.get_values()
else:
array = np.asarray(values)

def _format_strings(self):
fmt_values = format_array(self.values.get_values(), self.formatter,
fmt_values = format_array(array,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger : i'm struggling to resolve some formatting issues. what is the reason for calling format_array here. As far as I can tell is looping back round to create a GenericArrayFormatter instance with a formatter specified to pick up the display options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess, to be more succinct, why is super()._format_strings() not used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not that familiar with this code, but from a quick look: calling super()._format_strings() would be different, as this would call GenericArrayFormatter._format_strings, while the generic format_array can still result in using custom formatters like Datetime64(TZ)Formatter or Timedelta64Formatter, depending on what the values of the underlying EA are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although that most of those custom Formatter classes don't do much special if formatter is specified.

Eg Datetime64Formatter has this in _format_strings:

if self.formatter is not None and callable(self.formatter):
return [self.formatter(x) for x in values]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if ExtensionArrayFormatter is not inheriting from GenericArrayFormatter but calling format_array to dispatch to another ...ArrayFormatter class, why wouldn't the logic in ExtensionArrayFormatter be in format_array?

formatter,
float_format=self.float_format,
na_rep=self.na_rep, digits=self.digits,
space=self.space, justify=self.justify)
Expand Down
Loading