-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: (Series|DataFrame).explode #556
Conversation
475eeea
to
9a56bb5
Compare
bigframes/core/compile/compiled.py
Outdated
zip_array = ( | ||
table_w_offset[offset_array_id] | ||
.zip(*[table_w_offset[column_id] for column_id in column_ids]) | ||
.name(zip_array_id) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried directly indexing using the offsets rather than zipping? I'm worried compiling this creates an extra correlated JOIN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The new logical has a better performance!
), | ||
], | ||
) | ||
def test_series_explode(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if any of these tests use the unordered path. Maybe use to_pandas(ordered=False)
for a test or two? Also for unordered test cases make sure not to ignore index as resetting the index will invoke the ordered path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks dtype validation will trigger the unordered path. Also I added the aggregation and to_pandas(ordered=False)
as you suggested. Thanks!
b648bdb
to
9aef5a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* feat: (Series|DataFrame).explode * fixing schema and adding tests * fixing multi-index tests * add docs and fix tests
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕