Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add numba engine to several rolling aggregations #38895

Merged
merged 25 commits into from
Jan 4, 2021

Conversation

mroeschke
Copy link
Member

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Adds engine and engine_kwargs argument to mean, median, sum, min, max

@mroeschke mroeschke added this to the 1.3 milestone Jan 2, 2021
@mroeschke mroeschke added Enhancement Window rolling, ewma, expanding labels Jan 2, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have asv's that exercise both engines? can you show results

@@ -51,6 +51,7 @@ Other enhancements
- :func:`pandas.read_sql_query` now accepts a ``dtype`` argument to cast the columnar data from the SQL database based on user input (:issue:`10285`)
- Improved integer type mapping from pandas to SQLAlchemy when using :meth:`DataFrame.to_sql` (:issue:`35076`)
- :func:`to_numeric` now supports downcasting of nullable ``ExtensionDtype`` objects (:issue:`33013`)
- :meth:`.Rolling.sum`, :meth:`.Expanding.sum`, :meth:`.Rolling.mean`, :meth:`.Expanding.mean`, :meth:`.Rolling.median`, :meth:`.Expanding.median`, :meth:`.Rolling.max`, :meth:`.Expanding.max`, :meth:`.Rolling.min`, and :meth:`.Expanding.min` now support ``Numba`` execution with the ``engine`` keyword (:issue:`38895`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth a note in the user docs as well

@mroeschke
Copy link
Member Author

Here's a simple benchmark

In [13]: df = pd.DataFrame((100 * np.random.random((N, 20))))

In [14]: roll_df = df.rolling(10)

In [15]: roll_df.mean(engine="numba", engine_kwargs={"parallel": True, "nogil": True})

Out[15]:
                0          1          2          3          4          5          6   ...         13         14         15         16         17         18         19
0              NaN        NaN        NaN        NaN        NaN        NaN        NaN  ...        NaN        NaN        NaN        NaN        NaN        NaN        NaN
1              NaN        NaN        NaN        NaN        NaN        NaN        NaN  ...        NaN        NaN        NaN        NaN        NaN        NaN        NaN
2              NaN        NaN        NaN        NaN        NaN        NaN        NaN  ...        NaN        NaN        NaN        NaN        NaN        NaN        NaN
3              NaN        NaN        NaN        NaN        NaN        NaN        NaN  ...        NaN        NaN        NaN        NaN        NaN        NaN        NaN
4              NaN        NaN        NaN        NaN        NaN        NaN        NaN  ...        NaN        NaN        NaN        NaN        NaN        NaN        NaN
...            ...        ...        ...        ...        ...        ...        ...  ...        ...        ...        ...        ...        ...        ...        ...
9999995  53.109393  44.602854  44.822229  42.651840  52.167235  53.439720  34.879703  ...  49.400177  57.872117  30.206089  59.354336  38.903916  45.369952  36.430867
9999996  50.362627  49.842483  42.688050  36.641871  48.515543  53.232805  33.087431  ...  47.096377  59.415241  27.924127  62.007697  33.320770  41.012531  34.111801
9999997  52.901644  47.866224  42.234959  39.333112  53.802062  55.045552  36.034986  ...  48.406358  57.742796  27.194332  61.915400  33.759142  45.622714  35.858745
9999998  46.061426  53.639151  36.709441  44.440904  51.782057  48.010830  38.226252  ...  48.533150  62.545091  33.294572  65.169238  37.093948  44.830588  37.272684
9999999  40.147146  54.386060  43.861967  53.070330  53.403055  44.737140  42.638428  ...  49.509350  57.838767  33.224957  65.508094  35.911931  50.585150  36.818871

[10000000 rows x 20 columns]

In [16]: %timeit roll_df.mean()
7.69 s ± 19.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [18]: %timeit roll_df.mean(engine="numba", engine_kwargs={"parallel": True, "nogil": True})
5.97 s ± 45.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

)
param_names = ["constructor", "dtype", "function", "engine"]
param_names = ["constructor", "dtype", "function", "engine", "method"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these benchmarks still reasonable in terms of total time, e.g. < 1s per

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each benchmark is around ~1s. Is that too much?

nv.validate_window_func("sum", args, kwargs)
if maybe_use_numba(engine):
if self.method == "table":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth a common function for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I inline the args assignment it would essentially be a 1 line function.

Also I anticipate numba adding an axis argument incrementally so I would imagine some methods needing the if self.method == "table": and others not.

engine_kwargs=engine_kwargs, engine="numba"
)

# Once method='table' is supported, uncomment test below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could just xfail this

engine_kwargs=engine_kwargs, engine="numba"
)

# Once method='table' is supported, uncomment test below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jreback jreback merged commit df69f2a into pandas-dev:master Jan 4, 2021
@jreback
Copy link
Contributor

jreback commented Jan 4, 2021

thanks

as discussed you can write a function like that iterates over the first dimension and calls np.nan*

@mroeschke mroeschke deleted the enh/rolling_table_aggs branch January 5, 2021 03:47
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
@max-sixty
Copy link
Contributor

max-sixty commented Jun 13, 2021

Hi @mroeschke — not sure this is the best forum, but saw this in the release notes and wanted to confirm whether my read of the code was correct.

I'm one of the developers of numbagg, and xarray (in which we use numbagg). (And in the olden days, I had occasional pandas contributions.)

IIUC, a rolling numba mean in pandas will call generate_numba_apply_func with func as np.mean. And then that will calculate a window on each step, and apply the numbagg function over the whole window at each step.

Is that correct? Do you have any thoughts on the relative efficiency of that vs. a "rolling algo"; i.e. one that keeps a running sum & count, adding one new value and subtracting one existing value at each step? I had thought that a rolling algo would be significantly faster — particularly for large windows — but I haven't tested it and perhaps you considered this already?

IIUC, the cython functions in pandas are rolling algos. And here's an example of that implemented with numba in numbagg: https://github.com/numbagg/numbagg/blob/v0.2.1/numbagg/moving.py#L85

Thanks in advance, and congrats on getting this into pandas!

@mroeschke
Copy link
Member Author

You're understanding is correct @max-sixty; this implementation just calls np.mean over each window instead of tracking the sum and counts.

One of the perks about this implementation with numba over the sum and counts method is that the for loop over each window can be parallelized fine with numba which can yield nice performance gains. I don't think that the for looping that tracks sum and counts can be parallelized and yield correct results.

Without the parallelism, you're right that tracking sums and counts will be faster than the method I implemented here. I do still need to dig in a little more to see if these assumptions are correct.

@max-sixty
Copy link
Contributor

Thanks @mroeschke . And ofc another benefit is that the implementation is much more general, and can take user-jittable functions.

I'd be interested to know how benchmarks look if you do run them!

@max-sixty
Copy link
Contributor

One other point — though very low confidence — it looks like you had some issues with supplying an axis parameter to numba functions. In numbagg we use gufuncs, which then operate on any number of dimensions, and the axis is handled externally to the numba routine. In case that's helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants