-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor groupby helper from tempita to fused types #24954
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
|
||
import cython | ||
from cython import Py_ssize_t | ||
from cython cimport floating | ||
|
||
from libc.stdlib cimport malloc, free | ||
|
||
|
@@ -382,5 +383,55 @@ def group_any_all(uint8_t[:] out, | |
out[lab] = flag_val | ||
|
||
|
||
@cython.wraparound(False) | ||
@cython.boundscheck(False) | ||
def _group_add(floating[:, :] out, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just call this group_add |
||
int64_t[:] counts, | ||
floating[:, :] values, | ||
const int64_t[:] labels, | ||
Py_ssize_t min_count=0): | ||
""" | ||
Only aggregates on axis=0 | ||
""" | ||
cdef: | ||
Py_ssize_t i, j, N, K, lab, ncounts = len(counts) | ||
floating val, count | ||
ndarray[floating, ndim=2] sumx, nobs | ||
|
||
if not len(values) == len(labels): | ||
raise AssertionError("len(index) != len(labels)") | ||
|
||
nobs = np.zeros_like(out) | ||
sumx = np.zeros_like(out) | ||
|
||
N, K = (<object>values).shape | ||
|
||
with nogil: | ||
|
||
for i in range(N): | ||
lab = labels[i] | ||
if lab < 0: | ||
continue | ||
|
||
counts[lab] += 1 | ||
for j in range(K): | ||
val = values[i, j] | ||
|
||
# not nan | ||
if val == val: | ||
nobs[lab, j] += 1 | ||
sumx[lab, j] += val | ||
|
||
for i in range(ncounts): | ||
for j in range(K): | ||
if nobs[i, j] < min_count: | ||
out[i, j] = NAN | ||
else: | ||
out[i, j] = sumx[i, j] | ||
|
||
|
||
group_add_float32 = _group_add['float'] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we still need these? Thought from a previous comment that they weren't required Side note - I see you've been force pushing changes. If you merge master you can push without forcing, which typically does a better job of maintaining comment history There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, they are still needed (to avoid changing the code calling this function). atleast until some logic to handle int64_t differently is added to the _group_add function. I force push to keep only 1 commit (avoid the bloat). |
||
group_add_float64 = _group_add['double'] | ||
|
||
# generated from template | ||
include "groupby_helper.pxi" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -380,7 +380,7 @@ def get_func(fname): | |
# otherwise find dtype-specific version, falling back to object | ||
for dt in [dtype_str, 'object']: | ||
f = getattr(libgroupby, "{fname}_{dtype_str}".format( | ||
fname=fname, dtype_str=dtype_str), None) | ||
fname=fname, dtype_str=dt), None) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So do we even need this object fallback loop? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not for group_add and everything worked fine until now. |
||
if f is not None: | ||
return f | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR but I see there are some functions at the end of the template file which have been converted to fused types but still remain in the template file. Could make sense to move them here as well (separate PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at that (for next PR). thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd which functions did you find that are unnecessary? I couldn't find any.