Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Remove 'codes' parameter from MultiIndex signature, and add 'data' #24323

Closed
topper-123 opened this issue Dec 17, 2018 · 2 comments
Closed

Comments

@topper-123
Copy link
Contributor

topper-123 commented Dec 17, 2018

In #23752, the MultiIndex signature was changed. Compared to 0.23.4, the only change is that labels has been changed to codes.

Now that the signature is being changed anyway, I've started to think about if this is even the right signature:

I think codes should actually be an implementation detail, and an improved signature would be using data for the first parameter, similarly to how data is the first parameter in the signature for CategoricalIndex. So a signauture like this:

>>> inspect.signature(pd.MultiIndex)  # proposed signature
<Signature (data=None, levels=None, sortorder=None, names=None, dtype=None, copy=False, name=None, verify_integrity=True, _set_identity=True)>

I think would be better.

data could then accept codes, but could also accept other types of data, that could be used to construct a MultiIndex. For example:

>>> pd.MultiIndex(data=[[1,0, 1, 0], [0,1,0,1]], levels=[['a', 'b'], ['x', 'y']])
MultiIndex([('b', 'x'),  # repr after #22511
            ('a', 'y'),
            ('b', 'x'),
            ('a', 'y')],
           )
>>> pd.MultiIndex({'a': [1,2,3], 'v': ['a', 'd', 'q']})
MultiIndex([(1, 'a'),
            (2, 'd'),
            (3, 'q')],
            names=['a', 'b']
           )

In the first example, I use the current initalisation method, and in the second I show a initalisation with a dict, similar to how a DataFrame is initalized with a dict.

I think this could make the initialisation of MultiIndex more similar to the ones for the other pandas objects, and make MultiIndexes more friendly to use for users.

@TomAugspurger
Copy link
Contributor

and make MultiIndexes more friendly to use for users.

We already provide the alternative constructors. What's wrong with using those?

I'd caution against putting too much magic in the MultiIndex constructor. That kind of thing has made the Series and DataFrame constructors difficult to maintain.

@jreback
Copy link
Contributor

jreback commented Dec 18, 2018

we already have 3 different ways of constructing MultiIndex, which is enough; they are different enough that it would be very awkward to do this in a single constructor. The main constructor () should rarely be used directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants