Change MultiIndex repr ? #13480

jorisvandenbossche · 2016-06-18T09:49:19Z

The current MultiIndex representation looks like this:

In [15]: mi = pd.MultiIndex.from_tuples([('A', 1), ('A', 2),
    ...:                                 ('B', 3), ('B', 4)])

In [16]: mi
Out[16]:
MultiIndex(levels=[[u'A', u'B'], [1, 2, 3, 4]],
           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])

So this shows the underlying labels and levels. Personally, I don't find this a very good repr, because:

The shown level and labels are actually (internal) concepts of MultiIndex a lot of user do not have to think about.
It also does not give you the best idea how the MultiIndex looks like at a glance IMO (since you have to look at the values of the labels to know what the order of the levels will be).

So the question is: is there a better alternative?

We could show tuples:

MultiIndex([('A', 1), ('A', 2), ('B', 3), ('B', 4)])

Or the individual levels:

MultiIndex([['A', 'A', 'B', 'B'], [1, 2, 3, 4]])

or ..

Historical note: in older versions, there was a difference between the repr and str of a MultiIndex:

In [1]: mi = pd.MultiIndex.from_tuples([('A', 1), ('A', 2),
   ...:                                         ('B', 3), ('B', 4)])

In [2]: mi
Out[2]:
MultiIndex(levels=[[u'A', u'B'], [1, 2, 3, 4]],
           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])

In [3]: print mi

A  1
   2
B  3
   4

In [6]: pd.__version__
Out[6]: '0.14.1'

but seems to been disappeared (on purpose or not).

The text was updated successfully, but these errors were encountered:

J535D165 · 2017-12-28T10:56:21Z

It would be nice to have the possibility to change the representation through pandas.options.

J535D165 · 2018-02-21T21:10:40Z

Any thoughts on the desired behaviour @jorisvandenbossche? I think that tuples are a good choice.

The tuple output is also close to the workaround many people use:

In [15]: x = numpy.arange(10000)

In [16]: pandas.MultiIndex.from_arrays([x, x]).values
Out[16]: 
array([(0, 0), (1, 1), (2, 2), ..., (9997, 9997), (9998, 9998),
       (9999, 9999)], dtype=object)

@rockg @hoechenberger

jorisvandenbossche · 2018-02-21T21:25:37Z

I personally also think that tuples would be good representation.

The main 'problem' with it is that something like MultiIndex([('A', 1), ('A', 2), ('B', 1)]) looks like code that can be ran as well, while this is not a valid construction. Which might be confusing (and also goes against the python idiom of the repr being unambiguous (but that's something we are already violating with other reprs as well)).

topper-123 · 2018-08-13T21:24:37Z

I've made a suggestion that works locally but would appreciate input as to the exact format.

I too like the tuple format the best, but the tuples should IMO be vertically stacked, so the user at a glance can see both the individual level (vertically) and each row (horizontally).

Various attributes should also be shown at the bottom, to mirror CategoricalIndex.

An example with a reasonably complex MultiIndex:

>>> n = 100
>>> c = pd.CategoricalIndex(list('abcdefghijklmnop' * n), name='a')
>>> i = pd.Index(c.codes, name='b')
>>> pd.MultiIndex.from_arrays([c, i])
MultiIndex([ ('a', 0),
             ('b', 1),
             ('c', 2),
             ('d', 3),
             ('e', 4),
             ('f', 5),
             ('g', 6),
             ('h', 7),
             ('i', 8),
             ('j', 9),
            ...
             ('g', 6),
             ('h', 7),
             ('i', 8),
             ('j', 9),
            ('k', 10),
            ('l', 11),
            ('m', 12),
            ('n', 13),
            ('o', 14),
            ('p', 15)],
           levels=[
             CategoricalIndex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l',
                  'm', 'n', 'o', 'p'],
                 categories=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', ...], ordered=False, name='a', dtype='category'),
             Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], dtype='int64', name='b'),
           ],
           names=['a', 'b'],
           length=1600)

I like that levels has type information. The individual levels should concatenate their value on the first line, but otherwise I like this. Thoughts?

topper-123 · 2018-08-15T10:19:20Z

Any comment on the proposed format?

h-vetinari · 2018-08-15T10:32:26Z

What do you do if not all combinations are present? Probably still need to show labels=, no?

topper-123 · 2018-08-15T11:21:13Z

Not IMO. I see labels as an implementation detail, similar to CategoricalIndex.codes. There is even an approved issue (#13443) to change the name of labels to codes.

So, IMO we should absolutely not show labels in the repr, it just confuses.

h-vetinari · 2018-08-15T11:38:19Z

Fair enough. Plus, there'll be the repr of the tuples already.

jreback · 2018-08-15T11:51:03Z

@topper-123 a note on impl

we already have all of the machinery to do this
pls DO NOT write much custom code

see what we are doing for all other Indexes and follow the pattern / subclassing

this is a hard problem because of display wrapping and indentation - but it is already solved

topper-123 · 2018-08-15T12:01:59Z

Yes, I did that mostly, though I had a problem with wrapping each value in a new line, rather than wrapping several values, as other indexes do. I'll look into it, it I can get that part unified with the rest also.

topper-123 · 2018-08-25T15:30:42Z

Current proposal looks like this:

>>> n = 1_000_000
>>> ci = pd.CategoricalIndex(list('a' * n) + (['bcd'] * n), categories=['a', 'bcd'], ordered=True)
>>> dti =pd.date_range('2000-01-01', freq='s', periods=2000000)
>>> mi = pd.MultiIndex.from_arrays([ci, ci.codes+9, dti, dti, dti], names = ['a', 'b', 'x', 'x2', 'x3'])
>>> mi
MultiIndex([(  'a',  9, '2000-01-01 00:00:00', '2000-01-01 00:00:00', ...),
            (  'a',  9, '2000-01-01 00:00:01', '2000-01-01 00:00:01', ...),
            (  'a',  9, '2000-01-01 00:00:02', '2000-01-01 00:00:02', ...),
            (  'a',  9, '2000-01-01 00:00:03', '2000-01-01 00:00:03', ...),
            (  'a',  9, '2000-01-01 00:00:04', '2000-01-01 00:00:04', ...),
            (  'a',  9, '2000-01-01 00:00:05', '2000-01-01 00:00:05', ...),
            (  'a',  9, '2000-01-01 00:00:06', '2000-01-01 00:00:06', ...),
            (  'a',  9, '2000-01-01 00:00:07', '2000-01-01 00:00:07', ...),
            (  'a',  9, '2000-01-01 00:00:08', '2000-01-01 00:00:08', ...),
            (  'a',  9, '2000-01-01 00:00:09', '2000-01-01 00:00:09', ...),
            ...
            ('bcd', 10, '2000-01-24 03:33:10', '2000-01-24 03:33:10', ...),
            ('bcd', 10, '2000-01-24 03:33:11', '2000-01-24 03:33:11', ...),
            ('bcd', 10, '2000-01-24 03:33:12', '2000-01-24 03:33:12', ...),
            ('bcd', 10, '2000-01-24 03:33:13', '2000-01-24 03:33:13', ...),
            ('bcd', 10, '2000-01-24 03:33:14', '2000-01-24 03:33:14', ...),
            ('bcd', 10, '2000-01-24 03:33:15', '2000-01-24 03:33:15', ...),
            ('bcd', 10, '2000-01-24 03:33:16', '2000-01-24 03:33:16', ...),
            ('bcd', 10, '2000-01-24 03:33:17', '2000-01-24 03:33:17', ...),
            ('bcd', 10, '2000-01-24 03:33:18', '2000-01-24 03:33:18', ...),
            ('bcd', 10, '2000-01-24 03:33:19', '2000-01-24 03:33:19', ...)],
           dtype='object', names=['a', 'b', 'x', 'x2', 'x3'], length=2000000)

So we now got:

item formatting according to each level's formatting rule,
right-justification for each tuple column,
row-wise truncation according to pd.options.display.max_seq_items,
column-wise truncation according to pd.options.display.width,

Comments on the look of this?

jorisvandenbossche · 2018-12-06T10:27:56Z

@pandas-dev/pandas-core I would like to draw some attention to this.

TL;DR: There is a PR implementing a new MultiIndex repr (#22511), which is a kind of stuck because I want more feedback on the proposed repr.

The MultiIndex repr can use some improvement, see top post (#13480 (comment)) for some reasons.

@topper-123 made a concrete proposal 3 months ago (see the post above this one for details), and has a PR implementing it (#22511):

MultiIndex([(  'a',  9, '2000-01-01 00:00:00', '2000-01-01 00:00:00', ...),
            (  'a',  9, '2000-01-01 00:00:01', '2000-01-01 00:00:01', ...),
            (  'a',  9, '2000-01-01 00:00:02', '2000-01-01 00:00:02', ...),
            (  'a',  9, '2000-01-01 00:00:03', '2000-01-01 00:00:03', ...),
            (  'a',  9, '2000-01-01 00:00:04', '2000-01-01 00:00:04', ...),
            (  'a',  9, '2000-01-01 00:00:05', '2000-01-01 00:00:05', ...),
            (  'a',  9, '2000-01-01 00:00:06', '2000-01-01 00:00:06', ...),
            (  'a',  9, '2000-01-01 00:00:07', '2000-01-01 00:00:07', ...),
            (  'a',  9, '2000-01-01 00:00:08', '2000-01-01 00:00:08', ...),
            (  'a',  9, '2000-01-01 00:00:09', '2000-01-01 00:00:09', ...),
            ...
            ('bcd', 10, '2000-01-24 03:33:10', '2000-01-24 03:33:10', ...),
            ('bcd', 10, '2000-01-24 03:33:11', '2000-01-24 03:33:11', ...),
            ('bcd', 10, '2000-01-24 03:33:12', '2000-01-24 03:33:12', ...),
            ('bcd', 10, '2000-01-24 03:33:13', '2000-01-24 03:33:13', ...),
            ('bcd', 10, '2000-01-24 03:33:14', '2000-01-24 03:33:14', ...),
            ('bcd', 10, '2000-01-24 03:33:15', '2000-01-24 03:33:15', ...),
            ('bcd', 10, '2000-01-24 03:33:16', '2000-01-24 03:33:16', ...),
            ('bcd', 10, '2000-01-24 03:33:17', '2000-01-24 03:33:17', ...),
            ('bcd', 10, '2000-01-24 03:33:18', '2000-01-24 03:33:18', ...),
            ('bcd', 10, '2000-01-24 03:33:19', '2000-01-24 03:33:19', ...)],
           dtype='object', names=['a', 'b', 'x', 'x2', 'x3'], length=2000000)

I have some remarks on that proposal (summarized here from #22511 (review)):

It looks like valid code but it is actually not:
- The main reason is because the default constructor does not accept a list of tuples (MI.from_tuples does that), although there is an issue to discuss to change this: Let MultiIndex accept a list-like as input #23887. So depending on that, this might be a moot argument.
- Even regardless of the above, due to truncation symbols and the length indication, it is often still not valid code. But this is of course exactly the same situation as with other Index reprs.
When you have more levels, the tuples in the repr get truncated (as in the example above). This has the consequence that for such cases, you don't see anything in the default repr about this level, except the name (but for example not even the type). Of course, this will typically not be an issue with MIs with 2-3 levels.
For a DataFrame the situation can be similar (columns not visible in the default repr), but you have eg the info() method to get an overview of the columns and its types. So this could be an argument to add a similar method to MI, but it could maybe also be included in the main repr? (see below)

The MultiIndex is a more complex object than a normal Index (multiple levels -> multiple dtypes and names, typically repeated values in a level), so therefore we could consider also a more advanced repr for it.

Just as an example of what another repr for the MultiIndex could be, here a small mock-up (inspired on info and on the repr of xarrays objects where the different dimensions are listed, initially posted here: #22511 (comment)):

<pandas.MultiIndex>
Levels (5):
    * a: category (2) ['a', 'abc']
    * b: int64 (2) [9, 10]
    * dti_1: datetime64[ns] (2000) ['2000-01-01 00:00:00', ... ] 
    * dti_2: datetime64[ns] (2000) ['2000-01-01 00:00:00', ... ] 
    * dti_3: datetime64[ns] (2000) ['2000-01-01 00:00:00', ... ] 
[('a',    9, '2000-01-01 00:00:00', '2000-01-01 00:00:00', '2000-01-01 00:00:00'),
 ('a',    9, '2000-01-01 00:00:01', '2000-01-01 00:00:01', '2000-01-01 00:00:01'),
 ('a',    9, '2000-01-01 00:00:02', '2000-01-01 00:00:02', '2000-01-01 00:00:02'),
 ...,
 ('abc', 10, '2000-01-01 00:33:17', '2000-01-01 00:33:17', '2000-01-01 00:33:17'),
 ('abc', 10, '2000-01-01 00:33:18', '2000-01-01 00:33:18', '2000-01-01 00:33:18'),
 ('abc', 10, '2000-01-01 00:33:19', '2000-01-01 00:33:19', '2000-01-01 00:33:19')],
Lenght: 2000

The formatting is certainly not yet perfect, but it's to give an idea. I personally think the above is more informative than the other proposed repr (it includes an easier overview of the different levels, it includes information about the dtypes of the levels, ..).

My main issue is that I would just like to have at least some discussion on this. We actually never discussed the initial proposal of @topper-123, and also never really discussed my other idea.
What do we think is the best / most informative repr for the MultiIndex?

And to be a clear: I am not saying that my idea is necessarily better, I mainly want us to have at least a bit of discussion about it, as we should not change a repr lightly (and if we change it now, we should not change it again for some time). I certainly find what is in the PR an clear improvement over master (it is also based on what I initially proposed myself when originally opening this issue).

TomAugspurger · 2018-12-06T12:03:54Z

I didn't realize that @topper-123's proposed repr would truncate past certain levels, but that is of course unavoidable. Does #22511 <#22511> include a system like DataFrame, where large reprs are automatically truncated? IMO, the most important part of a MutlIndex repr is seeing the name and type of each level, and a few values if possible. If you need to see more values, then it can be used as the index for a Series, and you'll get the (maybe sparsified) version of what's above.

…

On Thu, Dec 6, 2018 at 4:28 AM Joris Van den Bossche < ***@***.***> wrote: @pandas-dev/pandas-core <https://github.com/orgs/pandas-dev/teams/pandas-core> I would like to draw some attention to this. TL;DR: There is a PR implementing a new MultiIndex repr (#22511 <#22511>), which is a kind of stuck because I want more feedback on the proposed repr. ------------------------------ The MultiIndex repr can use some improvement, see top post (#13480 (comment) <#13480 (comment)>) for some reasons. @topper-123 <https://github.com/topper-123> made a concrete proposal 3 months ago (see the post above this one for details), and has a PR implementing it (#22511 <#22511> ): MultiIndex([( 'a', 9, '2000-01-01 00:00:00', '2000-01-01 00:00:00', ...), ( 'a', 9, '2000-01-01 00:00:01', '2000-01-01 00:00:01', ...), ( 'a', 9, '2000-01-01 00:00:02', '2000-01-01 00:00:02', ...), ( 'a', 9, '2000-01-01 00:00:03', '2000-01-01 00:00:03', ...), ( 'a', 9, '2000-01-01 00:00:04', '2000-01-01 00:00:04', ...), ( 'a', 9, '2000-01-01 00:00:05', '2000-01-01 00:00:05', ...), ( 'a', 9, '2000-01-01 00:00:06', '2000-01-01 00:00:06', ...), ( 'a', 9, '2000-01-01 00:00:07', '2000-01-01 00:00:07', ...), ( 'a', 9, '2000-01-01 00:00:08', '2000-01-01 00:00:08', ...), ( 'a', 9, '2000-01-01 00:00:09', '2000-01-01 00:00:09', ...), ... ('bcd', 10, '2000-01-24 03:33:10', '2000-01-24 03:33:10', ...), ('bcd', 10, '2000-01-24 03:33:11', '2000-01-24 03:33:11', ...), ('bcd', 10, '2000-01-24 03:33:12', '2000-01-24 03:33:12', ...), ('bcd', 10, '2000-01-24 03:33:13', '2000-01-24 03:33:13', ...), ('bcd', 10, '2000-01-24 03:33:14', '2000-01-24 03:33:14', ...), ('bcd', 10, '2000-01-24 03:33:15', '2000-01-24 03:33:15', ...), ('bcd', 10, '2000-01-24 03:33:16', '2000-01-24 03:33:16', ...), ('bcd', 10, '2000-01-24 03:33:17', '2000-01-24 03:33:17', ...), ('bcd', 10, '2000-01-24 03:33:18', '2000-01-24 03:33:18', ...), ('bcd', 10, '2000-01-24 03:33:19', '2000-01-24 03:33:19', ...)], dtype='object', names=['a', 'b', 'x', 'x2', 'x3'], length=2000000) I have some remarks on that proposal (summarized here from #22511 (review) <#22511 (review)> ): - It looks like valid code but it is actually not: - The main reason is because the default constructor does not accept a list of tuples (MI.from_tuples does that), although there is an issue to discuss to change this: #23887 <#23887>. So depending on that, this might be a moot argument. - Even regardless of the above, due to truncation symbols and the length indication, it is often still not valid code. But this is of course exactly the same situation as with other Index reprs. - When you have more levels, the tuples in the repr get truncated (as in the example above). This has the consequence that for such cases, you don't see anything in the default repr about this level, except the name (but for example not even the type). Of course, this will typically not be an issue with MIs with 2-3 levels. For a DataFrame the situation can be similar (columns not visible in the default repr), but you have eg the info() method to get an overview of the columns and its types. So this could be an argument to add a similar method to MI, but it could maybe also be included in the main repr? (see below) The MultiIndex is a more complex object than a normal Index (multiple levels -> multiple dtypes and names, typically repeated values in a level), so therefore we could consider also a more advanced repr for it. Just as an example of what another repr for the MultiIndex could be, here a small mock-up (inspired on info and on the repr of xarrays objects where the different dimensions are listed, initially posted here: #22511 (comment) <#22511 (comment)>): <pandas.MultiIndex> Levels (5): * a: category (2) ['a', 'abc'] * b: int64 (2) [9, 10] * dti_1: datetime64[ns] (2000) ['2000-01-01 00:00:00', ... ] * dti_2: datetime64[ns] (2000) ['2000-01-01 00:00:00', ... ] * dti_3: datetime64[ns] (2000) ['2000-01-01 00:00:00', ... ] [('a', 9, '2000-01-01 00:00:00', '2000-01-01 00:00:00', '2000-01-01 00:00:00'), ('a', 9, '2000-01-01 00:00:01', '2000-01-01 00:00:01', '2000-01-01 00:00:01'), ('a', 9, '2000-01-01 00:00:02', '2000-01-01 00:00:02', '2000-01-01 00:00:02'), ..., ('abc', 10, '2000-01-01 00:33:17', '2000-01-01 00:33:17', '2000-01-01 00:33:17'), ('abc', 10, '2000-01-01 00:33:18', '2000-01-01 00:33:18', '2000-01-01 00:33:18'), ('abc', 10, '2000-01-01 00:33:19', '2000-01-01 00:33:19', '2000-01-01 00:33:19')], Lenght: 2000 The formatting is certainly not yet perfect, but it's to give an idea. I personally think the above is more *informative* than the other proposed repr (it includes an easier overview of the different levels, it includes information about the dtypes of the levels, ..). My main issue is that I would just like to have at least *some* discussion on this. We actually never discussed the initial proposal of @topper-123 <https://github.com/topper-123>, and also never really discussed my other idea. *What do we think is the best / most informative repr for the MultiIndex?* And to be a clear: I am not saying that my idea is necessarily better, I mainly want us to have at least a bit of discussion about it, as we should not change a repr lightly (and if we change it now, we should not change it again for some time). I certainly find what is in the PR an clear improvement over master (it is also based on what I initially proposed myself when originally opening this issue). — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#13480 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIv1AI9IujTnrViHEwcJYNM_37lAoks5u2PE4gaJpZM4I46bJ> .

jreback · 2018-12-06T12:10:36Z

@jorisvandenbossche I actually like the bottom repr (xarray like), but I agree that is maybe more food for a .info() method; also it gives the impression to me that MI is accessed via columns, rather than a row-accessed based tuple.

I would go with @topper-123 repr, with 1 change. Let's drop dtype, and replace with dtypes=list_of_dtypes 1 for each level, similar to how names is 1 for each level.

I also don't mind/care that @topper-123 repr is not actually executable, this principle is just true for short Index reprs, and is not true for EA at all. So I don't think this is a consideration.

jorisvandenbossche · 2018-12-06T12:55:26Z

Does #22511 include a system like DataFrame, where large reprs are automatically truncated?

@TomAugspurger What do you mean exactly?
It has two ways of truncation (see also the bullet points in #13480 (comment)): 1) it only shows up to pd.options.display.max_seq_items number of items of the index (here the number of tuples), and otherwise the first 10 / last 10, similar to all the other Index reprs (the main difference here is that each tuples is always put on a new line).
And 2) it has truncation for the width, showing only those items of a single tuple that fit in pd.options.display.width (and this is the one where certain levels might be "hidden").

IMO, the most important part of a MutlIndex repr is seeing the name and type of each level, and a few values if possible

The current repr in #22511 only shows the names (and the values if fitting in the width of the console)

I actually like the bottom repr (xarray like), but I agree that is maybe more food for a .info() method;

If we go with #22511, I think adding an info method that returns something like that would be a good idea. However, I still like to argue that it might be more informative to simply have this as the default repr, without the need of an additional info method.

also it gives the impression to me that MI is accessed via columns, rather than a row-accessed based tuple.

I think if it is combined with some values as tuples, I personally don't think it would be too much of a problem. We could maybe also put the preview of values as tuples on top, and the overview of levels below it.

TomAugspurger · 2018-12-06T13:02:16Z

@TomAugspurger What do you mean exactly?

Oh, I was thinking in the past, when we had pd.options.display.large_repr = 'info', and we would automatically switch over to the info repr for large dataframes. I'd forgotten we no longer do that now that (by default) now that we truncate.

However, I still like to argue that it might be more informative to simply have this as the default repr, without the need of an additional info method.

I think I agree with this... Personally, when I need to see the values of a MultiIndex I stick it in a Series pd.Series(1, index=mi), and you get the nice sparsified view of the values.

topper-123 · 2018-12-07T16:09:49Z

I'm in agreement with @jreback. I also like having a dtypes attribute. The repr above would become:

MultiIndex([(  'a',  9, '2000-01-01 00:00:00', '2000-01-01 00:00:00', ...),
            (  'a',  9, '2000-01-01 00:00:01', '2000-01-01 00:00:01', ...),
            (  'a',  9, '2000-01-01 00:00:02', '2000-01-01 00:00:02', ...),
            (  'a',  9, '2000-01-01 00:00:03', '2000-01-01 00:00:03', ...),
            (  'a',  9, '2000-01-01 00:00:04', '2000-01-01 00:00:04', ...),
            (  'a',  9, '2000-01-01 00:00:05', '2000-01-01 00:00:05', ...),
            (  'a',  9, '2000-01-01 00:00:06', '2000-01-01 00:00:06', ...),
            (  'a',  9, '2000-01-01 00:00:07', '2000-01-01 00:00:07', ...),
            (  'a',  9, '2000-01-01 00:00:08', '2000-01-01 00:00:08', ...),
            (  'a',  9, '2000-01-01 00:00:09', '2000-01-01 00:00:09', ...),
            ...
            ('bcd', 10, '2000-01-24 03:33:10', '2000-01-24 03:33:10', ...),
            ('bcd', 10, '2000-01-24 03:33:11', '2000-01-24 03:33:11', ...),
            ('bcd', 10, '2000-01-24 03:33:12', '2000-01-24 03:33:12', ...),
            ('bcd', 10, '2000-01-24 03:33:13', '2000-01-24 03:33:13', ...),
            ('bcd', 10, '2000-01-24 03:33:14', '2000-01-24 03:33:14', ...),
            ('bcd', 10, '2000-01-24 03:33:15', '2000-01-24 03:33:15', ...),
            ('bcd', 10, '2000-01-24 03:33:16', '2000-01-24 03:33:16', ...),
            ('bcd', 10, '2000-01-24 03:33:17', '2000-01-24 03:33:17', ...),
            ('bcd', 10, '2000-01-24 03:33:18', '2000-01-24 03:33:18', ...),
            ('bcd', 10, '2000-01-24 03:33:19', '2000-01-24 03:33:19', ...)],
           dtypes=['category', 'int64', 'datetime64[ns]', 'datetime64[ns]',
                   'datetime64[ns]'],
           names=['a', 'b', 'x', 'x2', 'x3'], length=2000000)

That would mean adding a MultiIndex.dtypes property. If there's agreement/acceptance on this, I could look into it.

EDIT: other indexes use Index.dtype.name and not just index.dtype in their repr, changed it to be similar.

jreback · 2018-12-07T16:27:25Z

maybe could have a way to display the names & dtypes in a nicer way here as well (in a combined way)

topper-123 · 2018-12-07T16:32:39Z

hmm, some levels may not have a name. What did you have in mind?

jreback · 2018-12-07T16:42:36Z

hmm yeah

maybe just zip them? into a tulple list

topper-123 · 2018-12-07T17:15:00Z

I think that would confuse. If we simply use dtypes and names, people can look up individually using MultiIndex.dtypes and MultiIndex.names. I think that would be nice and simple.

BTW, I did the repr above slightly different, if you didn’t notice.

topper-123 · 2018-12-09T19:11:39Z

I've been thinking a bit more about dtypes in the repr.

Having dtypes in the repr would make it not possible to recreate multiindexes from their repr (after implementing #23887 and possibly setting options.display.max_seq_items). So there's a trade-off involved.

I agree with @jorisvandenbossche that the current proposal is real-looking and therefore should be creatable from the repr, if possible. This is also a good/common convention in Python.

So I'm leaning towards not having dtypes in the repr after all would be the best trade-off.

Thoughts?

jorisvandenbossche · 2018-12-09T22:44:10Z

(forgot to post, comment of a day ago; not yet an answer on the last comment of @topper-123 )

Let's drop dtype, and replace with dtypes=list_of_dtypes 1 for each level, similar to how names is 1 for each level.

I have the feeling that adding this additional information for a MultiIndex somewhat bumps into the limits of what the current constructor-like Index repr can handle. Why do we necessarily want to keep to the python-code-like repr if we are adding a lot of "keywords" that are not actually keywords? (although the same is already true for "length" for normal Index)
Going away from "looking like python code" would give a lot more flexibility to make a clear overview of the level and its dtypes.

maybe could have a way to display the names & dtypes in a nicer way here as well (in a combined way)

Well, that's basically what I tried with my proposal (not saying it was the best attempt, but that was at least the idea of it.
Something like

Levels (3):
    * a: category (2) ['a', 'abc']
    * b: int64 (2) [9, 10]
    * dti_1: datetime64[ns] (2000) ['2000-01-01 00:00:00', ... ]

although I don't know how to make it clear that the numbers between brackets are the number of unique values in the level.

I am not sure you can have such a combination with python-like code. @jreback What would you propose how it looks like? (a dict?)

jreback · 2018-12-14T12:16:49Z

I am -1 on the xarray like constructor. I think it’s completely misleading as this is a row oriented object. I would be +1 on the proposed repr as is (w/o the dtype=‘object’) and +0 on adding a .info()

I completely discount the point of non-executable nature of this code as this is irrelevant; any index is non repr after truncation; furthermore there is a deliberate efffort to make non repr Arrays as well.

TomAugspurger · 2018-12-14T12:27:17Z

Personally, I don't have a strong preference. I find the "info" style one a bit more informative at a glance, but if that's available as a .info method then that's fine too. We'll need to see what Joris thinks. And just to be clear, thanks for your work on this @topper-123. Either way, this is a big improvement over the current repr.

…

On Fri, Dec 14, 2018 at 6:16 AM Jeff Reback ***@***.***> wrote: I am -1 on the xarray like constructor. I think it’s completely misleading as this is a row oriented object. I would be +1 on the proposed repr as is (w/o the dtype=‘object’) and +0 on adding a .info() I completely discount the point of non-executable nature of this code as this is irrelevant; any index is non repr after truncation; furthermore there is a deliberate efffort to make non repr Arrays as well. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13480 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIjq0v347ua5LrN-i9v3pGe6cPQzMks5u45a1gaJpZM4I46bJ> .

jorisvandenbossche · 2018-12-16T15:51:48Z

[@topper-123] I agree with @jorisvandenbossche that the current proposal is real-looking and therefore should be creatable from the repr, if possible. This is also a good/common convention in Python.

To be clear: I am not saying that the repr should be executable (as my proposed actually isn't). I am mainly trying to argue that we shouldn't jump through hoops and end up with a less informative repr only to keep it "python code-like" (eg adding the multiple dtypes in a keyword), while it is not executable anyway.
So if we want to add information about dtypes (which I think would be informative), why not step away from the restriction of having it python-code-like?

[@jreback] I completely discount the point of non-executable nature of this code as this is irrelevant; any index is non repr after truncation;

Yes, I understand that. But that is not the only argument. The main reason for having the overview of the levels is to be more informative + counteract that the repr in the PR can hide levels.

furthermore there is a deliberate efffort to make non repr Arrays as well.

@jreback What do you mean here?

I am -1 on the xarray like constructor. I think it’s completely misleading as this is a row oriented object

Above you said "actually like the bottom repr (xarray like)", or would you still like it as info object?
To be clear, I don't propose showing only the xarray-like levels overview as in #13480 (comment). My original proposal was to combine that with the snapshot of values as @topper-123 developed in the PR. As I already said above (#13480 (comment)), my feeling is that by combining those two, it should make it clear that this is also a row oriented object (and in the end, we still store a MultiIndex level by level, so it also has a column-oriented aspect).

And just to be clear, thanks for your work on this @topper-123. Either way, this is a big improvement over the current repr.

Big +1 !

jreback · 2018-12-16T19:55:09Z

Above you said "actually like the bottom repr (xarray like)", or would you still like it as info object?

.info() could be added but that is not the subject of this issue / PR.

I have already stated that I am -1 on the xarray like repr as the default as its amazingly confusing for a row-oriented object.

We have spent an enormous amount of time on this. @jorisvandenbossche Either propose a new PR, or accept the existing.

TomAugspurger · 2018-12-16T20:04:18Z

Joris did propose an alternative... Regardless, this seems worth taking the time to get it right.

…

On Sun, Dec 16, 2018 at 1:55 PM Jeff Reback ***@***.***> wrote: Above you said "actually like the bottom repr (xarray like)", or would you still like it as info object? .info() could be added but that is not the subject of this issue / PR. I have already stated that I am -1 on the xarray like repr as the default as its amazingly confusing for a row-oriented object. We have spent an enormous amount of time on this. @jorisvandenbossche <https://github.com/jorisvandenbossche> Either propose a new PR, or accept the existing. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#13480 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIl701dzDRoyeD2gVBZbxhUZrX02Lks5u5qUhgaJpZM4I46bJ> .

jreback · 2018-12-16T20:56:37Z

Joris did propose an alternative... Regardless, this seems worth taking the
time to get it right.

And I have numerous times been -1 on that.

jorisvandenbossche added Output-Formatting __repr__ of pandas objects, to_string Needs Discussion Requires discussion from core team before further action labels Jun 18, 2016

jorisvandenbossche mentioned this issue Jun 18, 2016

API: MultiIndex.labels -> codes #13443

Closed

jorisvandenbossche added the MultiIndex label Jun 18, 2016

jorisvandenbossche mentioned this issue Aug 27, 2016

ENH: multiindex formatting #12929

Closed

jorisvandenbossche added this to the 0.21.0 milestone Aug 18, 2017

jreback modified the milestones: 0.21.0, Next Major Release Sep 23, 2017

jreback mentioned this issue Jan 11, 2018

reduce output when printing indices and columns in pandas data frame and series #19183

Closed

jreback added Difficulty Intermediate labels Jan 11, 2018

topper-123 mentioned this issue Aug 26, 2018

ENH: better MultiIndex.__repr__ #22511

Merged

3 tasks

jreback modified the milestones: Contributions Welcome, 0.24.0 Sep 18, 2018

jreback modified the milestones: 0.24.0, 0.25.0 Jan 4, 2019

jreback closed this as completed in #22511 Jun 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change MultiIndex repr ? #13480

Change MultiIndex repr ? #13480

jorisvandenbossche commented Jun 18, 2016

J535D165 commented Dec 28, 2017

J535D165 commented Feb 21, 2018

jorisvandenbossche commented Feb 21, 2018

topper-123 commented Aug 13, 2018 •

edited

Loading

topper-123 commented Aug 15, 2018

h-vetinari commented Aug 15, 2018

topper-123 commented Aug 15, 2018

h-vetinari commented Aug 15, 2018

jreback commented Aug 15, 2018

topper-123 commented Aug 15, 2018 •

edited

Loading

topper-123 commented Aug 25, 2018 •

edited

Loading

jorisvandenbossche commented Dec 6, 2018

TomAugspurger commented Dec 6, 2018 via email

jreback commented Dec 6, 2018

jorisvandenbossche commented Dec 6, 2018

TomAugspurger commented Dec 6, 2018

topper-123 commented Dec 7, 2018 •

edited

Loading

jreback commented Dec 7, 2018

topper-123 commented Dec 7, 2018

jreback commented Dec 7, 2018

topper-123 commented Dec 7, 2018 •

edited

Loading

topper-123 commented Dec 9, 2018

jorisvandenbossche commented Dec 9, 2018

jreback commented Dec 14, 2018

TomAugspurger commented Dec 14, 2018 via email

jorisvandenbossche commented Dec 16, 2018

jreback commented Dec 16, 2018

TomAugspurger commented Dec 16, 2018 via email

jreback commented Dec 16, 2018

Change MultiIndex repr ? #13480

Change MultiIndex repr ? #13480

Comments

jorisvandenbossche commented Jun 18, 2016

J535D165 commented Dec 28, 2017

J535D165 commented Feb 21, 2018

jorisvandenbossche commented Feb 21, 2018

topper-123 commented Aug 13, 2018 • edited Loading

topper-123 commented Aug 15, 2018

h-vetinari commented Aug 15, 2018

topper-123 commented Aug 15, 2018

h-vetinari commented Aug 15, 2018

jreback commented Aug 15, 2018

topper-123 commented Aug 15, 2018 • edited Loading

topper-123 commented Aug 25, 2018 • edited Loading

jorisvandenbossche commented Dec 6, 2018

TomAugspurger commented Dec 6, 2018 via email

jreback commented Dec 6, 2018

jorisvandenbossche commented Dec 6, 2018

TomAugspurger commented Dec 6, 2018

topper-123 commented Dec 7, 2018 • edited Loading

jreback commented Dec 7, 2018

topper-123 commented Dec 7, 2018

jreback commented Dec 7, 2018

topper-123 commented Dec 7, 2018 • edited Loading

topper-123 commented Dec 9, 2018

jorisvandenbossche commented Dec 9, 2018

jreback commented Dec 14, 2018

TomAugspurger commented Dec 14, 2018 via email

jorisvandenbossche commented Dec 16, 2018

jreback commented Dec 16, 2018

TomAugspurger commented Dec 16, 2018 via email

jreback commented Dec 16, 2018

topper-123 commented Aug 13, 2018 •

edited

Loading

topper-123 commented Aug 15, 2018 •

edited

Loading

topper-123 commented Aug 25, 2018 •

edited

Loading

topper-123 commented Dec 7, 2018 •

edited

Loading

topper-123 commented Dec 7, 2018 •

edited

Loading