Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default repr for EAs #23601
Add default repr for EAs #23601
Changes from all commits
0fdbfd3
ace62aa
6e76b51
fef04e6
1885a97
ecfcd72
4e0d91f
37638cc
6e64b7b
193747e
5a2e1e4
1635b73
e2b1941
48e55cc
d8e7ba4
b312fe4
445736d
60e0d02
5b07906
ff0c998
2fd3d5d
5d8d2fc
baee6b2
4d343ea
5b291d5
1b93bf0
708dd75
0f4083e
9116930
ebadf6f
e5f6976
221cee9
439f2f8
2364546
62b1e2f
a926dca
fc4279d
27db397
5c253a4
ef390fc
2b5fe25
d84cc02
d9df6bf
a35399e
740f9e5
e7cc2ac
c79ba0b
3825aeb
2a60c15
bccf40d
a7ef104
a3b1c92
e080023
6ad113b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowercase Length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why lower? It's uppercase in the Series repr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its lowercase everywhere (else)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everywhere else being index? Categorical uses a capital. I'd like to eventually use pieces of this for the categorical repr, and would rather not break that repr, so I think a capital makes more sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In index it is as part of the "keywords" in the constructor-resembling repr. There it indeed makes sense to have it lowercase. But here, I think capitalized is much more logical (it's the first item on that line), and consistent with Series and Categorical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I guess if we match the EA's to Series, except for the quoting (e.g. quote in EA, but no change in Series, meaning no quoting), then Index is separate. I am still concerned with these slight differences however.
E.g. even here, dtype is lowercase and Length is uppercase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that any discussion of quoting whether or not to quote is up to the individual array.
Are you specifically talking about quoting within PeriodArray here? Do we plan to quote within DatetimeArray and TimedeltaArray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't Index also use
False
like Array? (as I thought this was to satisfy the difference between Series/DataFrame vs Index)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index is hypothetical at this point, until we support EA-backed indexes. This isn't currently called by our index reprs. So we can choose to define it how we want. Or we can make this a
box=None/Series/Index
argument.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For external EAs, yes. But eg PeriodArray is stored in an Index, and there
box
is used for this purpose, no?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not. Done in 3825aeb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add this arg to the Index formatters as well for compatiblity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow the need for adding it to the index formatters, since indexes are boxed by definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should
NaN
have been hardcoded here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? This is used for the Integer data display, where we currently use NaN.
(whether we should use rather 'NA' instead of 'NaN', that's another question)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering whether it would be a problem with
to_string(na_rep=...)
. will do some tests.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, that's a good reason. But in general, this
_formatter
does not follow display options at all, is that correct?In which case this is something to think about in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this the only index sublcass that you need to do this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the only extension-array backed index that's using the formatting right now.
Datetime / TImedelta will use this too, so this will be pushed up into
DatetimeIndexOpsMixin
later.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a TODO here? this is until DatetimeArray is fully pushed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That depends on whether we're willing to change
__array__
for datetime-backed series / index (right now . I'm writing up an issue now to discuss that specific point.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#23569 (comment) for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger : i'm struggling to resolve some formatting issues. what is the reason for calling
format_array
here. As far as I can tell is looping back round to create aGenericArrayFormatter
instance with aformatter
specified to pick up the display options.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess, to be more succinct, why is
super()._format_strings()
not used?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not that familiar with this code, but from a quick look: calling
super()._format_strings()
would be different, as this would callGenericArrayFormatter._format_strings
, while the genericformat_array
can still result in using custom formatters likeDatetime64(TZ)Formatter
orTimedelta64Formatter
, depending on what the values of the underlying EA are.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although that most of those custom Formatter classes don't do much special if formatter is specified.
Eg
Datetime64Formatter
has this in_format_strings
:pandas/pandas/io/formats/format.py
Lines 1174 to 1175 in 181f972
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if
ExtensionArrayFormatter
is not inheriting fromGenericArrayFormatter
but callingformat_array
to dispatch to another...ArrayFormatter
class, why wouldn't the logic inExtensionArrayFormatter
be informat_array
?