-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: series.reindex(mi)
behaves different for series with Index and MultiIndex
#60923
Comments
Thanks for the report! Is it possible to shorten your example? Making it as short as possible to demonstrate the behavior would certainly be appreciated. |
Thanks for the feedback. I managed to shorten the example and provided additional comments. |
Adding to this because I've seen this situation before. This is about the fact when calling series = pd.Series([1, 2], index=pd.Index(["x", "y"], name="lvl1"))
# the same series as above, with the index converted to multi-index
series_mi = series.copy()
series_mi.index = pd.MultiIndex.from_frame(series.index.to_frame())
# a multi-index with a level not present in the index of the series
idx = pd.MultiIndex.from_product([["x", "y"], ["m", "n"]], names=["lvl1", "lvl2"])
# this returns nans
series.reindex(idx)
# this returns a series with values [1,1,2,2]
series_mi.reindex(idx) |
Thanks @ssche - finally got some time to take a look into this. I believe this can currently be accomplished with the series = pd.Series(
[26.7300, 24.2550],
index=pd.Index([81, 82], name='a')
)
other_index = pd.MultiIndex(
levels=[
pd.Index([81, 82], name='a'),
pd.Index([np.nan], name='b'),
pd.Index([
'2018-06-01', '2018-07-01'
], name='c')
],
codes=[
[0, 0, 1, 1],
[0, 0, 0, 0],
[0, 1, 0, 1]
],
names=['a', 'b', 'c']
)
series.reindex(other_index, level="a")
# a b c
# 81 NaN 2018-06-01 26.730
# 2018-07-01 26.730
# 82 NaN 2018-06-01 24.255
# 2018-07-01 24.255
# dtype: float64 But it makes sense to me to do this automatically if it's feasible. PRs are welcome! |
From
Could we repurpose |
I think so. I do not see the current behavior as being a useful op, so I think we can do this without deprecation (but in a major release only). |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Index
and a MultiIndex to use for reindexing laterreindex
toMultiIndex
(other_index
) which expandsseries.index
by two more levels.reindex
sets all values of the original series to NaN which can be fixed by turningseries.index
into a 1-levelMultiIndex
firstto_mi(...)
to turn theseries.index
into a 1-levelMultiIndex
reindex
on the newseries
withMultiIndex
and the values are maintained/filled as expectedIssue Description
In the above case,
series.reindex(multi_index)
will turn the series values to NaN when the series has a singleIndex
. However when the series index is converted to a 1-levelMultiIndex
prior to thereindex
, the values are maintained and filled as expected.In my opinion it shouldn't matter if a 1-level
MultiIndex
or anIndex
is used for areindex
- the outcomes should be the same.As a further discussion point (here or elsewhere), this issue (and others) also begs the question why a distinction between
Index
andMultiIndex
is necessary (I suspect there are historic reasons). I would imagine that many issues (and code) would go away ifMultiIndex
was used exclusively (even for 1-dimensional indices).Expected Behavior
The missing levels in
series_mi
(compared toother_index
) are added and the values of the partial index from the original series are used to fill the places of the added indices.Installed Versions
INSTALLED VERSIONS
commit : 3979e95
python : 3.11.11
python-bits : 64
OS : Linux
OS-release : 6.12.11-200.fc41.x86_64
Version : #1 SMP PREEMPT_DYNAMIC Fri Jan 24 04:59:58 UTC 2025
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_AU.UTF-8
LOCALE : en_AU.UTF-8
pandas : 3.0.0.dev0+1909.g3979e954a3.dirty
numpy : 1.26.4
dateutil : 2.9.0.post0
pip : 24.2
Cython : 3.0.11
sphinx : 8.1.3
IPython : 8.32.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.3
blosc : None
bottleneck : 1.4.2
fastparquet : 2024.11.0
fsspec : 2025.2.0
html5lib : 1.1
hypothesis : 6.125.2
gcsfs : 2025.2.0
jinja2 : 3.1.5
lxml.etree : 5.3.0
matplotlib : 3.10.0
numba : 0.61.0
numexpr : 2.10.2
odfpy : None
openpyxl : 3.1.5
psycopg2 : 2.9.10
pymysql : 1.4.6
pyarrow : 19.0.0
pyreadstat : 1.2.8
pytest : 8.3.4
python-calamine : None
pytz : 2025.1
pyxlsb : 1.0.10
s3fs : 2025.2.0
scipy : 1.15.1
sqlalchemy : 2.0.38
tables : 3.10.2
tabulate : 0.9.0
xarray : 2024.9.0
xlrd : 2.0.1
xlsxwriter : 3.2.2
zstandard : 0.23.0
tzdata : 2025.1
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: