Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: O(n) speedup in any/all by re-enabling short-circuiting for bool case #25070

Merged
merged 1 commit into from
Apr 30, 2019

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Feb 1, 2019

At present, the .all() and .any() functions scale as O(n), more or less regardless of their exit condition being fulfilled:

$ asv run -b Any -b All upstream/master
[  0.01%] ··· series_methods.All.time_all                                                                                                                                                                 ok
[  0.01%] ··· ========= ============ ============
              --                   case          
              --------- -------------------------
                  N         fast         slow    
              ========= ============ ============
                 1000     115±3μs      123±5μs   
               1000000   11.1±0.2ms   13.6±0.1ms 
              ========= ============ ============

[  0.01%] ··· series_methods.Any.time_any                                                                                                                                                                 ok
[  0.01%] ··· ========= ============ ============
              --                   case          
              --------- -------------------------
                  N         fast         slow    
              ========= ============ ============
                 1000     118±2μs      117±3μs   
               1000000   14.0±0.2ms   11.6±0.4ms 
              ========= ============ ============

The root cause of this was identified by asv find to be #8550, which was released as part of v0.15.2. Specifically, we call isna() on the entire input prior to dispatching to np.all/np.any; the latter does short circuit but is insignificant compared to isna().

My proposal here is to exempt np.bool from the isna() call on the basis that it is not capable of storing NaN values; I'm hesitant to make this case much broader than necessary to accommodate calls like pd.isnull(s).all(). The hasattr() check is necessary due to a Panel test case directly invoking nanall(Panel).

The good news is that this approach is very effective:

$ asv run -b Any -b All HEAD
[  0.01%] ··· series_methods.All.time_all                                                                                                                                                                 ok
[  0.01%] ··· ========= ============ ============
              --                   case          
              --------- -------------------------
                  N         fast         slow    
              ========= ============ ============
                 1000     59.8±1μs    60.9±0.7μs 
               1000000   73.7±0.4μs    138±4μs   
              ========= ============ ============

[  0.01%] ··· series_methods.Any.time_any                                                                                                                                                                 ok
[  0.01%] ··· ========= ========== ============
              --                  case         
              --------- -----------------------
                  N        fast        slow    
              ========= ========== ============
                 1000    59.7±1μs   59.7±0.8μs 
               1000000   75.7±1μs    133±3μs   
              ========= ========== ============

Additionally, this call is not too widely used within the pandas codebase as most invoke the numpy equivalent. That being said, we do see some broad speedups when running the entire suite:

$ asv compare ea013a25 37c2e7a2 -s --sort ratio --only-changed
       before           after         ratio
     [ea013a25]       [37c2e7a2]
     <master~1>       <master>  
-      10.4±0.2ms       9.41±0.1ms     0.90  timeseries.AsOf.time_asof('DataFrame')
-     1.54±0.03ms      1.39±0.01ms     0.90  groupby.GroupByMethods.time_dtype_as_group('float', 'rank', 'transformation')
-      38.3±0.8ms       34.3±0.6ms     0.90  stat_ops.FrameOps.time_op('median', 'float', 1, True)
-     1.18±0.03ms      1.05±0.01ms     0.89  groupby.GroupByMethods.time_dtype_as_group('int', 'cummax', 'direct')
-      23.2±0.3ms       20.8±0.2ms     0.89  groupby.MultiColumn.time_cython_sum
-     1.19±0.05ms      1.06±0.01ms     0.89  groupby.GroupByMethods.time_dtype_as_group('datetime', 'cummin', 'direct')
-         128±1ms          114±2ms     0.89  indexing.CategoricalIndexIndexing.time_get_indexer_list('non_monotonic')
-     4.12±0.08ms      3.64±0.08ms     0.88  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-     1.23±0.02ms      1.09±0.01ms     0.88  groupby.GroupByMethods.time_dtype_as_group('float', 'cummax', 'transformation')
-     1.20±0.04ms      1.06±0.02ms     0.88  groupby.GroupByMethods.time_dtype_as_group('float', 'cummin', 'direct')
-     1.20±0.02ms      1.06±0.01ms     0.88  groupby.GroupByMethods.time_dtype_as_field('int', 'cumsum', 'direct')
-     10.5±0.08ms       9.12±0.3ms     0.87  timeseries.AsOf.time_asof_nan('DataFrame')
-      6.30±0.3ms       5.46±0.2ms     0.87  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-     1.55±0.04ms      1.34±0.02ms     0.86  groupby.GroupByMethods.time_dtype_as_group('datetime', 'rank', 'transformation')
-         130±1ms        112±0.8ms     0.86  indexing.CategoricalIndexIndexing.time_get_indexer_list('monotonic_decr')
-      28.1±0.7ms       22.9±0.3ms     0.82  reshape.Cut.time_cut_float(10)
-      26.4±0.4ms         21.0±1ms     0.80  reshape.Cut.time_cut_float(4)
-        12.7±2ms       9.06±0.2ms     0.71  inference.DateInferOps.time_subtract_datetimes
-     4.33±0.05ms      3.03±0.04ms     0.70  timeseries.AsOf.time_asof_single('DataFrame')
-      1.04±0.2ms          681±7μs     0.66  groupby.GroupByMethods.time_dtype_as_group('int', 'median', 'transformation')
-     4.51±0.05ms      2.69±0.05ms     0.60  timeseries.AsOf.time_asof_nan_single('DataFrame')
-         112±2μs       59.5±0.8μs     0.53  series_methods.All.time_all(1000, 'fast')
-         115±2μs         60.7±1μs     0.53  series_methods.Any.time_any(1000, 'fast')
-         117±7μs         61.3±2μs     0.53  series_methods.All.time_all(1000, 'slow')
-         115±1μs         59.8±1μs     0.52  series_methods.Any.time_any(1000, 'slow')
-      2.39±0.6ms      1.20±0.01ms     0.50  stat_ops.SeriesOps.time_op('kurt', 'int', False)
-      2.45±0.6ms      1.21±0.01ms     0.49  stat_ops.SeriesOps.time_op('skew', 'int', False)
-      2.43±0.6ms      1.20±0.01ms     0.49  stat_ops.SeriesOps.time_op('kurt', 'int', True)
-      11.1±0.2ms        132±0.5μs     0.01  series_methods.Any.time_any(1000000, 'slow')
-      13.5±0.2ms          133±2μs     0.01  series_methods.All.time_all(1000000, 'slow')
-      10.9±0.3ms         74.9±1μs     0.01  series_methods.All.time_all(1000000, 'fast')
-      13.4±0.3ms         75.3±2μs     0.01  series_methods.Any.time_any(1000000, 'fast')

For the largest n tested, we see a speedup of ~170x in the fast case.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@pep8speaks
Copy link

pep8speaks commented Feb 1, 2019

Hello @qwhelan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-04-29 05:29:16 UTC

@codecov
Copy link

codecov bot commented Feb 1, 2019

Codecov Report

Merging #25070 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25070      +/-   ##
==========================================
- Coverage   92.37%   92.36%   -0.01%     
==========================================
  Files         166      166              
  Lines       52403    52409       +6     
==========================================
+ Hits        48405    48410       +5     
- Misses       3998     3999       +1
Flag Coverage Δ
#multiple 90.79% <100%> (ø) ⬆️
#single 42.88% <100%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/nanops.py 93.9% <100%> (+0.06%) ⬆️
pandas/util/testing.py 88.04% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea013a2...3acc6b6. Read the comment docs.

@codecov
Copy link

codecov bot commented Feb 1, 2019

Codecov Report

Merging #25070 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25070      +/-   ##
==========================================
- Coverage   91.97%   91.96%   -0.01%     
==========================================
  Files         175      175              
  Lines       52368    52378      +10     
==========================================
+ Hits        48164    48171       +7     
- Misses       4204     4207       +3
Flag Coverage Δ
#multiple 90.52% <100%> (ø) ⬆️
#single 40.69% <40.62%> (-0.15%) ⬇️
Impacted Files Coverage Δ
pandas/core/nanops.py 94.11% <100%> (+0.28%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...0387154. Read the comment docs.

@@ -201,11 +201,13 @@ def _get_fill_value(dtype, fill_value=None, fill_value_typ=None):


def _get_values(values, skipna, fill_value=None, fill_value_typ=None,
isfinite=False, copy=True, mask=None):
isfinite=False, copy=True, mask=None, compute_mask=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so rather than adding another argument, we already have the mask, can you simply have this accept mask=True in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up taking this approach, but instead making mask a positional argument and taking mask = None to be signifying that a mask has been determined to be unnecessary. If another sigil value would be preferred, I think that would be easy to do.

@@ -362,8 +364,12 @@ def nanany(values, axis=None, skipna=True, mask=None):
>>> nanops.nanany(s)
False
"""
values, mask, dtype, _, _ = _get_values(values, skipna, False, copy=skipna,
mask=mask)
if (hasattr(values, 'dtype') and is_bool_dtype(values.dtype) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this logic inside _get_values (maybe update the return signature to return skipna as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll investigate - this might require modifying some of the other utility functions to play nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ended up being tricky to do inside _get_values() as mask is sometimes used to compute array dimensions within certain nanops. My solution was to move the logic outside of _get_values, make mask a positional argument, and remove it from the result tuple. This allows for functions that need it for dimension calculations to always compute it while others can do so optionally.

@jreback jreback added the Performance Memory or execution speed performance label Feb 1, 2019
@jbrockmendel
Copy link
Member

Does the implementation become any simpler after Panel is removed?

@jreback
Copy link
Contributor

jreback commented Mar 20, 2019

can you merge master

@qwhelan qwhelan force-pushed the any_all_fix branch 5 times, most recently from 913dd30 to 94fbe52 Compare March 22, 2019 00:56
@qwhelan
Copy link
Contributor Author

qwhelan commented Mar 22, 2019

Apologies for the delay - interviewing processes have gotten in the way here.

@jbrockmendel Not especially.

@jreback Rebased on master. I've kept my additions in response to your comments as a separate commit for now - will squash if it seems they're going in the right general direction.

My PR to fix the underlying performance bug is here numpy/numpy#12988 and pending merge; hopefully it will make it into numpy 1.17 and to a large extent obviate the need for this change. It may make sense to expand this PR to include all integer dtypes, as they also cannot store NaN.

Additionally, np.any() and np.all() attempt to short-circuit but actually exhibit O(n) behavior due to processing of reduction ufuncs in NPY_BUFSIZE (ie, 8192 byte) chunks. I'm investigating ways to remove or increase the buffer size there, which should make pd.all() and pd.any() considerably faster. It's worth noting that there's no attempt to short-circuit for any non-bool dtypes.

isfinite=False, copy=True, mask=None):
def _maybe_get_mask(values, skipna, mask):
""" This function will return a mask iff it is necessary. Otherwise, return
None when a mask is not needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add parameters / returns in doc-string

""" This function will return a mask iff it is necessary. Otherwise, return
None when a mask is not needed.
"""
if (hasattr(values, 'dtype') and is_bool_dtype(values.dtype)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this first test needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first test is needed due to a nanall(Panel) test case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panel tests should be gone now



def _get_mask(values, mask, skipna=True):
if mask is None and skipna:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc-string



def _get_values(values, skipna, mask, fill_value=None, fill_value_typ=None,
copy=True):
""" utility to get the values view, mask, dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the doc-string here

# Boolean data cannot contain nulls, so signal via mask being None
return None

mask = _get_mask(values, mask, skipna=skipna)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't you do this inside _get_mask itself? (the test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _get_counts() function relies on mask providing basic shape information even when skipna=False. Unfortunately, creating an empty mask (eg, mask = np.zeros(values.shape)) to propagate this shape info for nanmean/etc negates the speed benefit to nanall()/nanany(), so two separate approaches are necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pass mask=None in those cases? I find this having to call 2 functions much more overhead than before.

""" This function will return a mask iff it is necessary. Otherwise, return
None when a mask is not needed.
"""
if (hasattr(values, 'dtype') and is_bool_dtype(values.dtype)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panel tests should be gone now

# Boolean data cannot contain nulls, so signal via mask being None
return None

mask = _get_mask(values, mask, skipna=skipna)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pass mask=None in those cases? I find this having to call 2 functions much more overhead than before.

@jreback
Copy link
Contributor

jreback commented Apr 5, 2019

can you merge master and update to comments

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is way more complicated and seemingly error prone than the original, can you see if you can simplify

def setup(self, N, dtype):
self.s = Series([1] * N, dtype=dtype)

def time_var(self, N, dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could make the function another param

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll probably remove these prior to merge as they duplicate existing benchmarks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update this

@qwhelan qwhelan force-pushed the any_all_fix branch 7 times, most recently from 3573c21 to d96b086 Compare April 17, 2019 07:08
@qwhelan
Copy link
Contributor Author

qwhelan commented Apr 17, 2019

I'm going to see if I can simplify further - I was able to remove the "copy=" parameter from _get_values().

That being said, allowing mask=None requires some more involved modifications that also allow for some decent performance gains. The most recent commit extends the speedups across all nanops and to all integer dtypes. In particular, we're now getting a 7-8x speedup in nansum() and nanmean() for int64 data. The nanprod(), nanvar(), and nanstd() functions see a ~3x speedup:

$ asv compare upstream/master HEAD -s --sort ratio --only-changed
       before           after         ratio
     [68b1da7f]       [d96b086c]
     <any_all_fix~3>       <any_all_fix>
-     2.06±0.01ms         1.85±0ms     0.89  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      52.3±0.5ms       46.5±0.7ms     0.89  reshape.Cut.time_cut_int(1000)
-         162±3ms        141±0.8ms     0.87  groupby.GroupByMethods.time_dtype_as_field('int', 'skew', 'transformation')
-         238±2ms          208±2ms     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'skew', 'direct')
-         240±3ms          208±2ms     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'skew', 'transformation')
-         374±2ms          324±3ms     0.87  groupby.GroupByMethods.time_dtype_as_group('float', 'skew', 'transformation')
-         373±3ms          321±2ms     0.86  groupby.GroupByMethods.time_dtype_as_group('float', 'skew', 'direct')
-      24.6±0.4ms       21.1±0.2ms     0.86  groupby.Nth.time_groupby_nth_all('float64')
-     4.67±0.07ms      3.98±0.08ms     0.85  reshape.SimpleReshape.time_stack
-      25.2±0.4ms       21.4±0.4ms     0.85  groupby.Nth.time_groupby_nth_all('datetime')
-        971±10μs         826±10μs     0.85  stat_ops.SeriesOps.time_op('median', 'int', False)
-         511±7μs         433±10μs     0.85  groupby.GroupByMethods.time_dtype_as_field('object', 'any', 'transformation')
-         166±4ms          140±2ms     0.85  groupby.GroupByMethods.time_dtype_as_field('int', 'skew', 'direct')
-        983±10μs         821±30μs     0.84  stat_ops.SeriesOps.time_op('median', 'int', True)
-     6.33±0.02ms      5.27±0.04ms     0.83  timeseries.AsOf.time_asof('DataFrame')
-     16.9±0.07ms       14.1±0.1ms     0.83  indexing.NumericSeriesIndexing.time_getitem_scalar(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
-         115±7μs         91.2±6μs     0.80  series_methods.NanOps.time_std(1000, 'int64')
-      4.23±0.1ms       3.36±0.1ms     0.79  indexing.NumericSeriesIndexing.time_ix_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-     6.42±0.06ms      5.03±0.03ms     0.78  timeseries.AsOf.time_asof_nan('DataFrame')
-        218±10μs          170±4μs     0.78  series_methods.NanOps.time_sem(1000, 'int64')
-      18.5±0.4ms       14.0±0.3ms     0.76  algorithms.Hashing.time_frame
-        70.4±2μs         53.2±4μs     0.76  series_methods.NanOps.time_sum(1000, 'int64')
-       39.3±10ms       29.7±0.5ms     0.75  series_methods.NanOps.time_sem(1000000, 'int64')
-     2.79±0.05ms      2.05±0.05ms     0.73  timeseries.AsOf.time_asof_single('DataFrame')
-      26.1±0.2ms         19.1±2ms     0.73  stat_ops.FrameOps.time_op('sem', 'int', 1, False)
-      26.2±0.2ms         19.2±2ms     0.73  stat_ops.FrameOps.time_op('sem', 'int', 1, True)
-        21.7±2ms         15.6±2ms     0.72  stat_ops.FrameOps.time_op('sem', 'int', 0, False)
-        21.8±2ms         15.6±2ms     0.72  stat_ops.FrameOps.time_op('sem', 'int', 0, True)
-         258±4μs          184±4μs     0.71  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int16Engine'>, <class 'numpy.int16'>), 'monotonic_incr')
-         248±3μs          176±2μs     0.71  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int8Engine'>, <class 'numpy.int8'>), 'monotonic_incr')
-        91.8±3μs         64.7±6μs     0.71  series_methods.NanOps.time_var(1000, 'int64')
-         129±3ms       90.5±0.4ms     0.70  frame_methods.Describe.time_dataframe_describe
-         312±3μs          217±1μs     0.69  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.UInt32Engine'>, <class 'numpy.uint32'>), 'monotonic_incr')
-        78.1±4μs         53.8±3μs     0.69  series_methods.NanOps.time_mean(1000, 'int64')
-      41.0±0.2ms       28.2±0.2ms     0.69  frame_methods.Describe.time_series_describe
-        89.0±6μs         61.0±4μs     0.69  series_methods.NanOps.time_skew(1000, 'int64')
-        92.7±4μs         62.0±2μs     0.67  series_methods.NanOps.time_kurt(1000, 'int64')
-        35.9±4ms         23.6±4ms     0.66  series_methods.NanOps.time_skew(1000000, 'int64')
-     2.92±0.02ms      1.91±0.03ms     0.66  timeseries.AsOf.time_asof_nan_single('DataFrame')
-      27.3±0.8ms       17.8±0.3ms     0.65  stat_ops.FrameOps.time_op('skew', 'int', 1, True)
-        59.2±2μs         38.6±3μs     0.65  series_methods.NanOps.time_argmax(1000, 'int64')
-     1.55±0.01ms      1.01±0.06ms     0.65  stat_ops.SeriesOps.time_op('sem', 'int', False)
-      9.85±0.3ms       6.36±0.1ms     0.65  groupby.TransformBools.time_transform_mean
-        36.1±4ms         23.2±4ms     0.64  series_methods.NanOps.time_kurt(1000000, 'int64')
-        51.2±1μs         32.7±3μs     0.64  series_methods.NanOps.time_prod(1000, 'int64')
-     1.58±0.04ms         999±60μs     0.63  stat_ops.SeriesOps.time_op('sem', 'int', True)
-         208±1ms        128±0.6ms     0.62  frame_methods.Dropna.time_dropna('all', 1)
-         207±1ms        125±0.6ms     0.60  frame_methods.Dropna.time_dropna('all', 0)
-        6.96±4ms      4.18±0.02ms     0.60  algorithms.Duplicated.time_duplicated(False, 'int')
-      24.9±0.1ms       14.9±0.2ms     0.60  stat_ops.FrameOps.time_op('kurt', 'int', 1, False)
-         253±2ms          151±4ms     0.60  frame_methods.Dropna.time_dropna_axis_mixed_dtypes('all', 1)
-        53.3±2μs         31.6±3μs     0.59  series_methods.All.time_all(1000, 'slow')
-      25.2±0.9ms       14.9±0.2ms     0.59  stat_ops.FrameOps.time_op('kurt', 'int', 1, True)
-        62.9±3μs         36.4±3μs     0.58  series_methods.NanOps.time_max(1000, 'int64')
-     5.05±0.04ms      2.93±0.01ms     0.58  stat_ops.FrameOps.time_op('median', 'int', 0, False)
-     5.07±0.03ms      2.93±0.03ms     0.58  stat_ops.FrameOps.time_op('median', 'int', 0, True)
-      16.2±0.1ms         9.29±2ms     0.57  stat_ops.FrameOps.time_op('skew', 'int', 0, False)
-      16.1±0.6ms         9.22±1ms     0.57  stat_ops.FrameOps.time_op('kurt', 'int', 0, True)
-      16.1±0.2ms         9.21±2ms     0.57  stat_ops.FrameOps.time_op('kurt', 'int', 0, False)
-        16.3±1ms         9.28±2ms     0.57  stat_ops.FrameOps.time_op('skew', 'int', 0, True)
-        63.9±4μs         35.4±1μs     0.55  series_methods.NanOps.time_min(1000, 'int64')
-        54.6±3μs         30.0±2μs     0.55  series_methods.All.time_all(1000, 'fast')
-        54.7±3μs         28.7±2μs     0.53  series_methods.Any.time_any(1000, 'fast')
-         264±1ms        136±0.3ms     0.51  frame_methods.Dropna.time_dropna_axis_mixed_dtypes('all', 0)
-        56.8±4μs         28.3±1μs     0.50  series_methods.Any.time_any(1000, 'slow')
-        852±20μs         388±10μs     0.46  stat_ops.SeriesOps.time_op('skew', 'int', False)
-        852±10μs          376±6μs     0.44  stat_ops.SeriesOps.time_op('skew', 'int', True)
-         852±9μs          376±6μs     0.44  stat_ops.SeriesOps.time_op('kurt', 'int', True)
-         292±3μs        123±0.9μs     0.42  stat_ops.SeriesOps.time_op('prod', 'int', True)
-         288±5μs          120±3μs     0.42  stat_ops.SeriesOps.time_op('prod', 'int', False)
-     1.23±0.01ms          490±4μs     0.40  stat_ops.FrameOps.time_op('prod', 'int', 0, True)
-     1.24±0.02ms          490±5μs     0.40  stat_ops.FrameOps.time_op('prod', 'int', 0, False)
-       931±100μs          369±6μs     0.40  stat_ops.SeriesOps.time_op('kurt', 'int', False)
-       158±0.6ms       62.5±0.4ms     0.40  frame_methods.Dropna.time_dropna_axis_mixed_dtypes('any', 1)
-         322±7μs          123±4μs     0.38  stat_ops.SeriesOps.time_op('mean', 'int', True)
-        708±10μs          261±8μs     0.37  stat_ops.SeriesOps.time_op('std', 'int', False)
-         328±3μs          120±2μs     0.37  stat_ops.SeriesOps.time_op('mean', 'int', False)
-         681±9μs         249±10μs     0.37  stat_ops.SeriesOps.time_op('var', 'int', False)
-      3.03±0.9ms      1.10±0.01ms     0.36  series_methods.NanOps.time_argmax(1000000, 'int64')
-        701±10μs          251±4μs     0.36  stat_ops.SeriesOps.time_op('std', 'int', True)
-     2.70±0.01ms          966±9μs     0.36  series_methods.NanOps.time_prod(1000000, 'int64')
-         690±6μs          243±3μs     0.35  stat_ops.SeriesOps.time_op('var', 'int', True)
-         305±7μs          103±3μs     0.34  stat_ops.SeriesOps.time_op('sum', 'int', False)
-        311±10μs         99.1±3μs     0.32  stat_ops.SeriesOps.time_op('sum', 'int', True)
-         121±1ms         38.4±1ms     0.32  frame_methods.Dropna.time_dropna('any', 0)
-       114±0.3ms       35.5±0.5ms     0.31  frame_methods.Dropna.time_dropna('any', 1)
-      8.80±0.2ms      2.64±0.07ms     0.30  stat_ops.FrameOps.time_op('std', 'int', 1, True)
-        17.5±4ms         5.20±4ms     0.30  series_methods.NanOps.time_std(1000000, 'int64')
-      8.83±0.2ms       2.61±0.2ms     0.30  stat_ops.FrameOps.time_op('std', 'int', 1, False)
-        8.61±1ms       2.41±0.3ms     0.28  stat_ops.FrameOps.time_op('var', 'int', 1, False)
-     1.37±0.01ms         382±10μs     0.28  stat_ops.FrameOps.time_op('prod', 'int', 1, True)
-     8.59±0.06ms      2.38±0.08ms     0.28  stat_ops.FrameOps.time_op('var', 'int', 1, True)
-     1.40±0.03ms         385±10μs     0.27  stat_ops.FrameOps.time_op('prod', 'int', 1, False)
-        17.8±4ms         4.75±4ms     0.27  series_methods.NanOps.time_var(1000000, 'int64')
-        3.70±1ms         963±10μs     0.26  series_methods.NanOps.time_mean(1000000, 'int64')
-      7.49±0.1ms      1.92±0.07ms     0.26  stat_ops.FrameOps.time_op('var', 'int', 0, True)
-      7.48±0.1ms      1.89±0.04ms     0.25  stat_ops.FrameOps.time_op('std', 'int', 0, False)
-      7.42±0.1ms      1.86±0.06ms     0.25  stat_ops.FrameOps.time_op('var', 'int', 0, False)
-      7.47±0.1ms      1.85±0.09ms     0.25  stat_ops.FrameOps.time_op('std', 'int', 0, True)
-         169±1ms       41.0±0.4ms     0.24  frame_methods.Dropna.time_dropna_axis_mixed_dtypes('any', 0)
-        3.39±1ms         723±10μs     0.21  series_methods.NanOps.time_min(1000000, 'int64')
-        3.67±1ms         722±10μs     0.20  series_methods.NanOps.time_max(1000000, 'int64')
-        3.40±1ms          649±9μs     0.19  series_methods.NanOps.time_sum(1000000, 'int64')
-     3.93±0.04ms          646±9μs     0.16  stat_ops.FrameOps.time_op('mean', 'int', 1, True)
-     3.91±0.03ms         641±10μs     0.16  stat_ops.FrameOps.time_op('mean', 'int', 1, False)
-     3.28±0.03ms         493±10μs     0.15  stat_ops.FrameOps.time_op('mean', 'int', 0, False)
-     3.27±0.03ms          488±8μs     0.15  stat_ops.FrameOps.time_op('mean', 'int', 0, True)
-     2.69±0.04ms         381±10μs     0.14  stat_ops.FrameOps.time_op('sum', 'int', 0, False)
-     2.70±0.02ms          375±5μs     0.14  stat_ops.FrameOps.time_op('sum', 'int', 0, True)
-     2.95±0.01ms         400±10μs     0.14  stat_ops.FrameOps.time_op('sum', 'int', 1, True)
-     2.94±0.02ms          383±8μs     0.13  stat_ops.FrameOps.time_op('sum', 'int', 1, False)
-      6.29±0.1ms       70.3±0.8μs     0.01  series_methods.Any.time_any(1000000, 'slow')
-      6.48±0.2ms         68.0±4μs     0.01  series_methods.All.time_all(1000000, 'slow')
-      6.50±0.2ms       37.2±0.9μs     0.01  series_methods.Any.time_any(1000000, 'fast')
-     6.37±0.07ms         35.5±1μs     0.01  series_methods.All.time_all(1000000, 'fast')

----------
values : ndarray
skipna : bool
mask : Optional[ndarray[bool]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so rather than typing here, can you type the args themselves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and also typed the return value. It appears the convention is to duplicate types in the return signature and docstring block; let me know if you'd prefer to keep only one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great. cc @WillAyd @jorisvandenbossche need to think about this
but prob ok for now

@@ -200,12 +200,84 @@ def _get_fill_value(dtype, fill_value=None, fill_value_typ=None):
return tslibs.iNaT


def _maybe_get_mask(values, skipna, mask, invert=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need invert? this is just confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls type annotate the parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invert was necessary in nanmedian(), but the median calculation is so slow there's minimal impact, so reverting that bit back and removing invert

skipna : bool
fill_value : Any, optional
value to fill NaNs with
fill_value_typ : str, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, can you type above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if skipna and not is_any_int_dtype(values):
mask = _maybe_get_mask(values, skipna, mask)

if skipna and not is_any_int_dtype(values) and mask is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need the int check here? (as the mask will by definition be None if we have ints)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The odd case of Series(dtype='bool').prod() will be slightly faster as a result of it being removed.

if mask is not None:
null_mask = mask.size - mask.sum()
else:
null_mask = functools.reduce(operator.mul, shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this line covered in tests? isn't this just np.prod(shape)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this should be getting hit by Series(dtype=int).sum() and the like. I'll confirm once codecov finishes.

@@ -200,12 +200,84 @@ def _get_fill_value(dtype, fill_value=None, fill_value_typ=None):
return tslibs.iNaT


def _maybe_get_mask(values, skipna, mask, invert=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls type annotate the parameters

@qwhelan qwhelan force-pushed the any_all_fix branch 6 times, most recently from 4f9c8eb to 6ac741b Compare April 22, 2019 02:27
@qwhelan
Copy link
Contributor Author

qwhelan commented Apr 22, 2019

Updated to reflect comments - the single failing test appears to be unrelated.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a whatsnew note in performance improvements
can you add some signature typing where indicated (as much as you can)
merge master; ping on green.

def setup(self, N, dtype):
self.s = Series([1] * N, dtype=dtype)

def time_var(self, N, dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update this

@jreback jreback added this to the 0.25.0 milestone Apr 28, 2019
@@ -569,7 +664,7 @@ def _get_counts_nanvar(mask, axis, ddof, dtype=float):
count = np.nan
d = np.nan
else:
mask2 = count <= ddof
mask2 = count <= ddof # type: np.ndarray
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy analysis needed an extra hint for the following line. It deduced that mask2: Union[bool, np.ndarray] and throws an error as bool does not have an any() function.

@qwhelan
Copy link
Contributor Author

qwhelan commented Apr 29, 2019

ping

@jreback jreback merged commit b6324be into pandas-dev:master Apr 30, 2019
@jreback
Copy link
Contributor

jreback commented Apr 30, 2019

thanks @qwhelan very nice!

@qwhelan qwhelan deleted the any_all_fix branch August 14, 2019 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants