Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix OverflowError in lib.maybe_indices_to_slice() #61080

Merged

Conversation

swt2c
Copy link
Contributor

@swt2c swt2c commented Mar 7, 2025

This fixes this error when slicing massive dataframes:

Traceback (most recent call last):
File "", line 1, in
File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4093, in getitem
return self._getitem_bool_array(key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4155, in _getitem_bool_array
return self._take_with_is_copy(indexer, axis=0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4153, in _take_with_is_copy
result = self.take(indices=indices, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4133, in take
new_data = self._mgr.take(
^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 893, in take
new_labels = self.axes[axis].take(indexer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/indexes/datetimelike.py", line 839, in take
maybe_slice = lib.maybe_indices_to_slice(indices, len(self))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "lib.pyx", line 522, in pandas._libs.lib.maybe_indices_to_slice
OverflowError: value too large to convert to int

This fixes this error when slicing massive dataframes:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4093, in __getitem__
    return self._getitem_bool_array(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4155, in _getitem_bool_array
    return self._take_with_is_copy(indexer, axis=0)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4153, in _take_with_is_copy
    result = self.take(indices=indices, axis=axis)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4133, in take
    new_data = self._mgr.take(
               ^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 893, in take
    new_labels = self.axes[axis].take(indexer)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/indexes/datetimelike.py", line 839, in take
    maybe_slice = lib.maybe_indices_to_slice(indices, len(self))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib.pyx", line 522, in pandas._libs.lib.maybe_indices_to_slice
OverflowError: value too large to convert to int
@swt2c
Copy link
Contributor Author

swt2c commented Mar 7, 2025

This is a squash and rebase of #59537.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update pandas/_libs/lib.pyi like in https://github.com/pandas-dev/pandas/pull/59537/files?

@mroeschke mroeschke added the Internals Related to non-user accessible pandas implementation label Mar 7, 2025
@swt2c
Copy link
Contributor Author

swt2c commented Mar 7, 2025

Can you also update pandas/_libs/lib.pyi like in https://github.com/pandas-dev/pandas/pull/59537/files?

If I do that, the CI fails. See the CI run for the "Sort whatsnew entries" commit for the errors. It seems like the type hint needs to stay as int?

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's OK that the stubs aren't _perfectly aligned with the Cython signature.

@mroeschke mroeschke added this to the 3.0 milestone Mar 7, 2025
@mroeschke mroeschke merged commit 0acb9a0 into pandas-dev:main Mar 7, 2025
46 checks passed
@mroeschke
Copy link
Member

Thanks @swt2c

@swt2c swt2c deleted the fix_overflowerror_slicing_massive_dataframes branch March 7, 2025 23:36
gm-oo9 pushed a commit to gm-oo9/pandas that referenced this pull request Mar 10, 2025
)

* BUG: Fix OverflowError in lib.maybe_indices_to_slice()

This fixes this error when slicing massive dataframes:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4093, in __getitem__
    return self._getitem_bool_array(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4155, in _getitem_bool_array
    return self._take_with_is_copy(indexer, axis=0)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4153, in _take_with_is_copy
    result = self.take(indices=indices, axis=axis)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4133, in take
    new_data = self._mgr.take(
               ^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 893, in take
    new_labels = self.axes[axis].take(indexer)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/indexes/datetimelike.py", line 839, in take
    maybe_slice = lib.maybe_indices_to_slice(indices, len(self))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib.pyx", line 522, in pandas._libs.lib.maybe_indices_to_slice
OverflowError: value too large to convert to int

* Sort whatsnew entries

* Set type hint back to int

---------

Co-authored-by: benjamindonnachie <83379521+benjamindonnachie@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: OverflowError: value too large to convert to int when manipulating very large dataframes
3 participants