PERF: O(n) speedup in any/all by re-enabling short-circuiting for bool case #25070

qwhelan · 2019-02-01T07:18:52Z

At present, the .all() and .any() functions scale as O(n), more or less regardless of their exit condition being fulfilled:

$ asv run -b Any -b All upstream/master
[  0.01%] ··· series_methods.All.time_all                                                                                                                                                                 ok
[  0.01%] ··· ========= ============ ============
              --                   case          
              --------- -------------------------
                  N         fast         slow    
              ========= ============ ============
                 1000     115±3μs      123±5μs   
               1000000   11.1±0.2ms   13.6±0.1ms 
              ========= ============ ============

[  0.01%] ··· series_methods.Any.time_any                                                                                                                                                                 ok
[  0.01%] ··· ========= ============ ============
              --                   case          
              --------- -------------------------
                  N         fast         slow    
              ========= ============ ============
                 1000     118±2μs      117±3μs   
               1000000   14.0±0.2ms   11.6±0.4ms 
              ========= ============ ============

The root cause of this was identified by asv find to be #8550, which was released as part of v0.15.2. Specifically, we call isna() on the entire input prior to dispatching to np.all/np.any; the latter does short circuit but is insignificant compared to isna().

My proposal here is to exempt np.bool from the isna() call on the basis that it is not capable of storing NaN values; I'm hesitant to make this case much broader than necessary to accommodate calls like pd.isnull(s).all(). The hasattr() check is necessary due to a Panel test case directly invoking nanall(Panel).

The good news is that this approach is very effective:

$ asv run -b Any -b All HEAD
[  0.01%] ··· series_methods.All.time_all                                                                                                                                                                 ok
[  0.01%] ··· ========= ============ ============
              --                   case          
              --------- -------------------------
                  N         fast         slow    
              ========= ============ ============
                 1000     59.8±1μs    60.9±0.7μs 
               1000000   73.7±0.4μs    138±4μs   
              ========= ============ ============

[  0.01%] ··· series_methods.Any.time_any                                                                                                                                                                 ok
[  0.01%] ··· ========= ========== ============
              --                  case         
              --------- -----------------------
                  N        fast        slow    
              ========= ========== ============
                 1000    59.7±1μs   59.7±0.8μs 
               1000000   75.7±1μs    133±3μs   
              ========= ========== ============

Additionally, this call is not too widely used within the pandas codebase as most invoke the numpy equivalent. That being said, we do see some broad speedups when running the entire suite:

$ asv compare ea013a25 37c2e7a2 -s --sort ratio --only-changed
       before           after         ratio
     [ea013a25]       [37c2e7a2]
     <master~1>       <master>  
-      10.4±0.2ms       9.41±0.1ms     0.90  timeseries.AsOf.time_asof('DataFrame')
-     1.54±0.03ms      1.39±0.01ms     0.90  groupby.GroupByMethods.time_dtype_as_group('float', 'rank', 'transformation')
-      38.3±0.8ms       34.3±0.6ms     0.90  stat_ops.FrameOps.time_op('median', 'float', 1, True)
-     1.18±0.03ms      1.05±0.01ms     0.89  groupby.GroupByMethods.time_dtype_as_group('int', 'cummax', 'direct')
-      23.2±0.3ms       20.8±0.2ms     0.89  groupby.MultiColumn.time_cython_sum
-     1.19±0.05ms      1.06±0.01ms     0.89  groupby.GroupByMethods.time_dtype_as_group('datetime', 'cummin', 'direct')
-         128±1ms          114±2ms     0.89  indexing.CategoricalIndexIndexing.time_get_indexer_list('non_monotonic')
-     4.12±0.08ms      3.64±0.08ms     0.88  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-     1.23±0.02ms      1.09±0.01ms     0.88  groupby.GroupByMethods.time_dtype_as_group('float', 'cummax', 'transformation')
-     1.20±0.04ms      1.06±0.02ms     0.88  groupby.GroupByMethods.time_dtype_as_group('float', 'cummin', 'direct')
-     1.20±0.02ms      1.06±0.01ms     0.88  groupby.GroupByMethods.time_dtype_as_field('int', 'cumsum', 'direct')
-     10.5±0.08ms       9.12±0.3ms     0.87  timeseries.AsOf.time_asof_nan('DataFrame')
-      6.30±0.3ms       5.46±0.2ms     0.87  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-     1.55±0.04ms      1.34±0.02ms     0.86  groupby.GroupByMethods.time_dtype_as_group('datetime', 'rank', 'transformation')
-         130±1ms        112±0.8ms     0.86  indexing.CategoricalIndexIndexing.time_get_indexer_list('monotonic_decr')
-      28.1±0.7ms       22.9±0.3ms     0.82  reshape.Cut.time_cut_float(10)
-      26.4±0.4ms         21.0±1ms     0.80  reshape.Cut.time_cut_float(4)
-        12.7±2ms       9.06±0.2ms     0.71  inference.DateInferOps.time_subtract_datetimes
-     4.33±0.05ms      3.03±0.04ms     0.70  timeseries.AsOf.time_asof_single('DataFrame')
-      1.04±0.2ms          681±7μs     0.66  groupby.GroupByMethods.time_dtype_as_group('int', 'median', 'transformation')
-     4.51±0.05ms      2.69±0.05ms     0.60  timeseries.AsOf.time_asof_nan_single('DataFrame')
-         112±2μs       59.5±0.8μs     0.53  series_methods.All.time_all(1000, 'fast')
-         115±2μs         60.7±1μs     0.53  series_methods.Any.time_any(1000, 'fast')
-         117±7μs         61.3±2μs     0.53  series_methods.All.time_all(1000, 'slow')
-         115±1μs         59.8±1μs     0.52  series_methods.Any.time_any(1000, 'slow')
-      2.39±0.6ms      1.20±0.01ms     0.50  stat_ops.SeriesOps.time_op('kurt', 'int', False)
-      2.45±0.6ms      1.21±0.01ms     0.49  stat_ops.SeriesOps.time_op('skew', 'int', False)
-      2.43±0.6ms      1.20±0.01ms     0.49  stat_ops.SeriesOps.time_op('kurt', 'int', True)
-      11.1±0.2ms        132±0.5μs     0.01  series_methods.Any.time_any(1000000, 'slow')
-      13.5±0.2ms          133±2μs     0.01  series_methods.All.time_all(1000000, 'slow')
-      10.9±0.3ms         74.9±1μs     0.01  series_methods.All.time_all(1000000, 'fast')
-      13.4±0.3ms         75.3±2μs     0.01  series_methods.Any.time_any(1000000, 'fast')

For the largest n tested, we see a speedup of ~170x in the fast case.

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2019-02-01T07:18:55Z

Hello @qwhelan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-04-29 05:29:16 UTC

codecov · 2019-02-01T07:53:41Z

Codecov Report

Merging #25070 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #25070      +/-   ##
==========================================
- Coverage   92.37%   92.36%   -0.01%     
==========================================
  Files         166      166              
  Lines       52403    52409       +6     
==========================================
+ Hits        48405    48410       +5     
- Misses       3998     3999       +1

Flag	Coverage Δ
#multiple	`90.79% <100%> (ø)`	⬆️
#single	`42.88% <100%> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/nanops.py	`93.9% <100%> (+0.06%)`	⬆️
pandas/util/testing.py	`88.04% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea013a2...3acc6b6. Read the comment docs.

codecov · 2019-02-01T07:53:42Z

Codecov Report

Merging #25070 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #25070      +/-   ##
==========================================
- Coverage   91.97%   91.96%   -0.01%     
==========================================
  Files         175      175              
  Lines       52368    52378      +10     
==========================================
+ Hits        48164    48171       +7     
- Misses       4204     4207       +3

Flag	Coverage Δ
#multiple	`90.52% <100%> (ø)`	⬆️
#single	`40.69% <40.62%> (-0.15%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/nanops.py	`94.11% <100%> (+0.28%)`	⬆️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`96.9% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...0387154. Read the comment docs.

jreback · 2019-02-01T18:27:27Z

pandas/core/nanops.py

@@ -201,11 +201,13 @@ def _get_fill_value(dtype, fill_value=None, fill_value_typ=None):


 def _get_values(values, skipna, fill_value=None, fill_value_typ=None,
-                isfinite=False, copy=True, mask=None):
+                isfinite=False, copy=True, mask=None, compute_mask=True):


so rather than adding another argument, we already have the mask, can you simply have this accept mask=True in this case?

I ended up taking this approach, but instead making mask a positional argument and taking mask = None to be signifying that a mask has been determined to be unnecessary. If another sigil value would be preferred, I think that would be easy to do.

jreback · 2019-02-01T18:27:46Z

pandas/core/nanops.py

@@ -362,8 +364,12 @@ def nanany(values, axis=None, skipna=True, mask=None):
    >>> nanops.nanany(s)
    False
    """
-    values, mask, dtype, _, _ = _get_values(values, skipna, False, copy=skipna,
-                                            mask=mask)
+    if (hasattr(values, 'dtype') and is_bool_dtype(values.dtype) and


can you do this logic inside _get_values (maybe update the return signature to return skipna as well)

I'll investigate - this might require modifying some of the other utility functions to play nice.

This ended up being tricky to do inside _get_values() as mask is sometimes used to compute array dimensions within certain nanops. My solution was to move the logic outside of _get_values, make mask a positional argument, and remove it from the result tuple. This allows for functions that need it for dimension calculations to always compute it while others can do so optionally.

jbrockmendel · 2019-02-01T19:42:41Z

Does the implementation become any simpler after Panel is removed?

jreback · 2019-03-20T02:09:12Z

can you merge master

qwhelan · 2019-03-22T01:52:55Z

Apologies for the delay - interviewing processes have gotten in the way here.

@jbrockmendel Not especially.

@jreback Rebased on master. I've kept my additions in response to your comments as a separate commit for now - will squash if it seems they're going in the right general direction.

My PR to fix the underlying performance bug is here numpy/numpy#12988 and pending merge; hopefully it will make it into numpy 1.17 and to a large extent obviate the need for this change. It may make sense to expand this PR to include all integer dtypes, as they also cannot store NaN.

Additionally, np.any() and np.all() attempt to short-circuit but actually exhibit O(n) behavior due to processing of reduction ufuncs in NPY_BUFSIZE (ie, 8192 byte) chunks. I'm investigating ways to remove or increase the buffer size there, which should make pd.all() and pd.any() considerably faster. It's worth noting that there's no attempt to short-circuit for any non-bool dtypes.

jreback · 2019-03-22T11:45:54Z

pandas/core/nanops.py

-                isfinite=False, copy=True, mask=None):
+def _maybe_get_mask(values, skipna, mask):
+    """ This function will return a mask iff it is necessary. Otherwise, return
+    None when a mask is not needed.


can you add parameters / returns in doc-string

jreback · 2019-03-22T11:46:07Z

pandas/core/nanops.py

+    """ This function will return a mask iff it is necessary. Otherwise, return
+    None when a mask is not needed.
+    """
+    if (hasattr(values, 'dtype') and is_bool_dtype(values.dtype)):


is this first test needed?

The first test is needed due to a nanall(Panel) test case

Panel tests should be gone now

jreback · 2019-03-22T11:46:21Z

pandas/core/nanops.py

+
+
+def _get_mask(values, mask, skipna=True):
+    if mask is None and skipna:


jreback · 2019-03-22T11:46:34Z

pandas/core/nanops.py

+
+
+def _get_values(values, skipna, mask, fill_value=None, fill_value_typ=None,
+                copy=True):
    """ utility to get the values view, mask, dtype


can you update the doc-string here

jreback · 2019-03-22T11:47:11Z

pandas/core/nanops.py

+        # Boolean data cannot contain nulls, so signal via mask being None
+        return None
+
+    mask = _get_mask(values, mask, skipna=skipna)


why can't you do this inside _get_mask itself? (the test)

The _get_counts() function relies on mask providing basic shape information even when skipna=False. Unfortunately, creating an empty mask (eg, mask = np.zeros(values.shape)) to propagate this shape info for nanmean/etc negates the speed benefit to nanall()/nanany(), so two separate approaches are necessary.

can we pass mask=None in those cases? I find this having to call 2 functions much more overhead than before.

jreback · 2019-03-24T15:40:19Z

pandas/core/nanops.py

+    """ This function will return a mask iff it is necessary. Otherwise, return
+    None when a mask is not needed.
+    """
+    if (hasattr(values, 'dtype') and is_bool_dtype(values.dtype)):


Panel tests should be gone now

jreback · 2019-03-24T15:41:13Z

pandas/core/nanops.py

+        # Boolean data cannot contain nulls, so signal via mask being None
+        return None
+
+    mask = _get_mask(values, mask, skipna=skipna)


can we pass mask=None in those cases? I find this having to call 2 functions much more overhead than before.

jreback · 2019-04-05T00:53:38Z

can you merge master and update to comments

jreback

this is way more complicated and seemingly error prone than the original, can you see if you can simplify

jreback · 2019-04-16T12:09:27Z

asv_bench/benchmarks/series_methods.py

+    def setup(self, N, dtype):
+        self.s = Series([1] * N, dtype=dtype)
+
+    def time_var(self, N, dtype):


could make the function another param

I'll probably remove these prior to merge as they duplicate existing benchmarks.

can you update this

qwhelan · 2019-04-17T19:39:48Z

I'm going to see if I can simplify further - I was able to remove the "copy=" parameter from _get_values().

That being said, allowing mask=None requires some more involved modifications that also allow for some decent performance gains. The most recent commit extends the speedups across all nanops and to all integer dtypes. In particular, we're now getting a 7-8x speedup in nansum() and nanmean() for int64 data. The nanprod(), nanvar(), and nanstd() functions see a ~3x speedup:

$ asv compare upstream/master HEAD -s --sort ratio --only-changed
       before           after         ratio
     [68b1da7f]       [d96b086c]
     <any_all_fix~3>       <any_all_fix>
-     2.06±0.01ms         1.85±0ms     0.89  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      52.3±0.5ms       46.5±0.7ms     0.89  reshape.Cut.time_cut_int(1000)
-         162±3ms        141±0.8ms     0.87  groupby.GroupByMethods.time_dtype_as_field('int', 'skew', 'transformation')
-         238±2ms          208±2ms     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'skew', 'direct')
-         240±3ms          208±2ms     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'skew', 'transformation')
-         374±2ms          324±3ms     0.87  groupby.GroupByMethods.time_dtype_as_group('float', 'skew', 'transformation')
-         373±3ms          321±2ms     0.86  groupby.GroupByMethods.time_dtype_as_group('float', 'skew', 'direct')
-      24.6±0.4ms       21.1±0.2ms     0.86  groupby.Nth.time_groupby_nth_all('float64')
-     4.67±0.07ms      3.98±0.08ms     0.85  reshape.SimpleReshape.time_stack
-      25.2±0.4ms       21.4±0.4ms     0.85  groupby.Nth.time_groupby_nth_all('datetime')
-        971±10μs         826±10μs     0.85  stat_ops.SeriesOps.time_op('median', 'int', False)
-         511±7μs         433±10μs     0.85  groupby.GroupByMethods.time_dtype_as_field('object', 'any', 'transformation')
-         166±4ms          140±2ms     0.85  groupby.GroupByMethods.time_dtype_as_field('int', 'skew', 'direct')
-        983±10μs         821±30μs     0.84  stat_ops.SeriesOps.time_op('median', 'int', True)
-     6.33±0.02ms      5.27±0.04ms     0.83  timeseries.AsOf.time_asof('DataFrame')
-     16.9±0.07ms       14.1±0.1ms     0.83  indexing.NumericSeriesIndexing.time_getitem_scalar(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
-         115±7μs         91.2±6μs     0.80  series_methods.NanOps.time_std(1000, 'int64')
-      4.23±0.1ms       3.36±0.1ms     0.79  indexing.NumericSeriesIndexing.time_ix_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-     6.42±0.06ms      5.03±0.03ms     0.78  timeseries.AsOf.time_asof_nan('DataFrame')
-        218±10μs          170±4μs     0.78  series_methods.NanOps.time_sem(1000, 'int64')
-      18.5±0.4ms       14.0±0.3ms     0.76  algorithms.Hashing.time_frame
-        70.4±2μs         53.2±4μs     0.76  series_methods.NanOps.time_sum(1000, 'int64')
-       39.3±10ms       29.7±0.5ms     0.75  series_methods.NanOps.time_sem(1000000, 'int64')
-     2.79±0.05ms      2.05±0.05ms     0.73  timeseries.AsOf.time_asof_single('DataFrame')
-      26.1±0.2ms         19.1±2ms     0.73  stat_ops.FrameOps.time_op('sem', 'int', 1, False)
-      26.2±0.2ms         19.2±2ms     0.73  stat_ops.FrameOps.time_op('sem', 'int', 1, True)
-        21.7±2ms         15.6±2ms     0.72  stat_ops.FrameOps.time_op('sem', 'int', 0, False)
-        21.8±2ms         15.6±2ms     0.72  stat_ops.FrameOps.time_op('sem', 'int', 0, True)
-         258±4μs          184±4μs     0.71  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int16Engine'>, <class 'numpy.int16'>), 'monotonic_incr')
-         248±3μs          176±2μs     0.71  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int8Engine'>, <class 'numpy.int8'>), 'monotonic_incr')
-        91.8±3μs         64.7±6μs     0.71  series_methods.NanOps.time_var(1000, 'int64')
-         129±3ms       90.5±0.4ms     0.70  frame_methods.Describe.time_dataframe_describe
-         312±3μs          217±1μs     0.69  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.UInt32Engine'>, <class 'numpy.uint32'>), 'monotonic_incr')
-        78.1±4μs         53.8±3μs     0.69  series_methods.NanOps.time_mean(1000, 'int64')
-      41.0±0.2ms       28.2±0.2ms     0.69  frame_methods.Describe.time_series_describe
-        89.0±6μs         61.0±4μs     0.69  series_methods.NanOps.time_skew(1000, 'int64')
-        92.7±4μs         62.0±2μs     0.67  series_methods.NanOps.time_kurt(1000, 'int64')
-        35.9±4ms         23.6±4ms     0.66  series_methods.NanOps.time_skew(1000000, 'int64')
-     2.92±0.02ms      1.91±0.03ms     0.66  timeseries.AsOf.time_asof_nan_single('DataFrame')
-      27.3±0.8ms       17.8±0.3ms     0.65  stat_ops.FrameOps.time_op('skew', 'int', 1, True)
-        59.2±2μs         38.6±3μs     0.65  series_methods.NanOps.time_argmax(1000, 'int64')
-     1.55±0.01ms      1.01±0.06ms     0.65  stat_ops.SeriesOps.time_op('sem', 'int', False)
-      9.85±0.3ms       6.36±0.1ms     0.65  groupby.TransformBools.time_transform_mean
-        36.1±4ms         23.2±4ms     0.64  series_methods.NanOps.time_kurt(1000000, 'int64')
-        51.2±1μs         32.7±3μs     0.64  series_methods.NanOps.time_prod(1000, 'int64')
-     1.58±0.04ms         999±60μs     0.63  stat_ops.SeriesOps.time_op('sem', 'int', True)
-         208±1ms        128±0.6ms     0.62  frame_methods.Dropna.time_dropna('all', 1)
-         207±1ms        125±0.6ms     0.60  frame_methods.Dropna.time_dropna('all', 0)
-        6.96±4ms      4.18±0.02ms     0.60  algorithms.Duplicated.time_duplicated(False, 'int')
-      24.9±0.1ms       14.9±0.2ms     0.60  stat_ops.FrameOps.time_op('kurt', 'int', 1, False)
-         253±2ms          151±4ms     0.60  frame_methods.Dropna.time_dropna_axis_mixed_dtypes('all', 1)
-        53.3±2μs         31.6±3μs     0.59  series_methods.All.time_all(1000, 'slow')
-      25.2±0.9ms       14.9±0.2ms     0.59  stat_ops.FrameOps.time_op('kurt', 'int', 1, True)
-        62.9±3μs         36.4±3μs     0.58  series_methods.NanOps.time_max(1000, 'int64')
-     5.05±0.04ms      2.93±0.01ms     0.58  stat_ops.FrameOps.time_op('median', 'int', 0, False)
-     5.07±0.03ms      2.93±0.03ms     0.58  stat_ops.FrameOps.time_op('median', 'int', 0, True)
-      16.2±0.1ms         9.29±2ms     0.57  stat_ops.FrameOps.time_op('skew', 'int', 0, False)
-      16.1±0.6ms         9.22±1ms     0.57  stat_ops.FrameOps.time_op('kurt', 'int', 0, True)
-      16.1±0.2ms         9.21±2ms     0.57  stat_ops.FrameOps.time_op('kurt', 'int', 0, False)
-        16.3±1ms         9.28±2ms     0.57  stat_ops.FrameOps.time_op('skew', 'int', 0, True)
-        63.9±4μs         35.4±1μs     0.55  series_methods.NanOps.time_min(1000, 'int64')
-        54.6±3μs         30.0±2μs     0.55  series_methods.All.time_all(1000, 'fast')
-        54.7±3μs         28.7±2μs     0.53  series_methods.Any.time_any(1000, 'fast')
-         264±1ms        136±0.3ms     0.51  frame_methods.Dropna.time_dropna_axis_mixed_dtypes('all', 0)
-        56.8±4μs         28.3±1μs     0.50  series_methods.Any.time_any(1000, 'slow')
-        852±20μs         388±10μs     0.46  stat_ops.SeriesOps.time_op('skew', 'int', False)
-        852±10μs          376±6μs     0.44  stat_ops.SeriesOps.time_op('skew', 'int', True)
-         852±9μs          376±6μs     0.44  stat_ops.SeriesOps.time_op('kurt', 'int', True)
-         292±3μs        123±0.9μs     0.42  stat_ops.SeriesOps.time_op('prod', 'int', True)
-         288±5μs          120±3μs     0.42  stat_ops.SeriesOps.time_op('prod', 'int', False)
-     1.23±0.01ms          490±4μs     0.40  stat_ops.FrameOps.time_op('prod', 'int', 0, True)
-     1.24±0.02ms          490±5μs     0.40  stat_ops.FrameOps.time_op('prod', 'int', 0, False)
-       931±100μs          369±6μs     0.40  stat_ops.SeriesOps.time_op('kurt', 'int', False)
-       158±0.6ms       62.5±0.4ms     0.40  frame_methods.Dropna.time_dropna_axis_mixed_dtypes('any', 1)
-         322±7μs          123±4μs     0.38  stat_ops.SeriesOps.time_op('mean', 'int', True)
-        708±10μs          261±8μs     0.37  stat_ops.SeriesOps.time_op('std', 'int', False)
-         328±3μs          120±2μs     0.37  stat_ops.SeriesOps.time_op('mean', 'int', False)
-         681±9μs         249±10μs     0.37  stat_ops.SeriesOps.time_op('var', 'int', False)
-      3.03±0.9ms      1.10±0.01ms     0.36  series_methods.NanOps.time_argmax(1000000, 'int64')
-        701±10μs          251±4μs     0.36  stat_ops.SeriesOps.time_op('std', 'int', True)
-     2.70±0.01ms          966±9μs     0.36  series_methods.NanOps.time_prod(1000000, 'int64')
-         690±6μs          243±3μs     0.35  stat_ops.SeriesOps.time_op('var', 'int', True)
-         305±7μs          103±3μs     0.34  stat_ops.SeriesOps.time_op('sum', 'int', False)
-        311±10μs         99.1±3μs     0.32  stat_ops.SeriesOps.time_op('sum', 'int', True)
-         121±1ms         38.4±1ms     0.32  frame_methods.Dropna.time_dropna('any', 0)
-       114±0.3ms       35.5±0.5ms     0.31  frame_methods.Dropna.time_dropna('any', 1)
-      8.80±0.2ms      2.64±0.07ms     0.30  stat_ops.FrameOps.time_op('std', 'int', 1, True)
-        17.5±4ms         5.20±4ms     0.30  series_methods.NanOps.time_std(1000000, 'int64')
-      8.83±0.2ms       2.61±0.2ms     0.30  stat_ops.FrameOps.time_op('std', 'int', 1, False)
-        8.61±1ms       2.41±0.3ms     0.28  stat_ops.FrameOps.time_op('var', 'int', 1, False)
-     1.37±0.01ms         382±10μs     0.28  stat_ops.FrameOps.time_op('prod', 'int', 1, True)
-     8.59±0.06ms      2.38±0.08ms     0.28  stat_ops.FrameOps.time_op('var', 'int', 1, True)
-     1.40±0.03ms         385±10μs     0.27  stat_ops.FrameOps.time_op('prod', 'int', 1, False)
-        17.8±4ms         4.75±4ms     0.27  series_methods.NanOps.time_var(1000000, 'int64')
-        3.70±1ms         963±10μs     0.26  series_methods.NanOps.time_mean(1000000, 'int64')
-      7.49±0.1ms      1.92±0.07ms     0.26  stat_ops.FrameOps.time_op('var', 'int', 0, True)
-      7.48±0.1ms      1.89±0.04ms     0.25  stat_ops.FrameOps.time_op('std', 'int', 0, False)
-      7.42±0.1ms      1.86±0.06ms     0.25  stat_ops.FrameOps.time_op('var', 'int', 0, False)
-      7.47±0.1ms      1.85±0.09ms     0.25  stat_ops.FrameOps.time_op('std', 'int', 0, True)
-         169±1ms       41.0±0.4ms     0.24  frame_methods.Dropna.time_dropna_axis_mixed_dtypes('any', 0)
-        3.39±1ms         723±10μs     0.21  series_methods.NanOps.time_min(1000000, 'int64')
-        3.67±1ms         722±10μs     0.20  series_methods.NanOps.time_max(1000000, 'int64')
-        3.40±1ms          649±9μs     0.19  series_methods.NanOps.time_sum(1000000, 'int64')
-     3.93±0.04ms          646±9μs     0.16  stat_ops.FrameOps.time_op('mean', 'int', 1, True)
-     3.91±0.03ms         641±10μs     0.16  stat_ops.FrameOps.time_op('mean', 'int', 1, False)
-     3.28±0.03ms         493±10μs     0.15  stat_ops.FrameOps.time_op('mean', 'int', 0, False)
-     3.27±0.03ms          488±8μs     0.15  stat_ops.FrameOps.time_op('mean', 'int', 0, True)
-     2.69±0.04ms         381±10μs     0.14  stat_ops.FrameOps.time_op('sum', 'int', 0, False)
-     2.70±0.02ms          375±5μs     0.14  stat_ops.FrameOps.time_op('sum', 'int', 0, True)
-     2.95±0.01ms         400±10μs     0.14  stat_ops.FrameOps.time_op('sum', 'int', 1, True)
-     2.94±0.02ms          383±8μs     0.13  stat_ops.FrameOps.time_op('sum', 'int', 1, False)
-      6.29±0.1ms       70.3±0.8μs     0.01  series_methods.Any.time_any(1000000, 'slow')
-      6.48±0.2ms         68.0±4μs     0.01  series_methods.All.time_all(1000000, 'slow')
-      6.50±0.2ms       37.2±0.9μs     0.01  series_methods.Any.time_any(1000000, 'fast')
-     6.37±0.07ms         35.5±1μs     0.01  series_methods.All.time_all(1000000, 'fast')

jreback · 2019-04-17T21:56:04Z

pandas/core/nanops.py

+    ----------
+    values : ndarray
+    skipna : bool
+    mask : Optional[ndarray[bool]]


so rather than typing here, can you type the args themselves

Done, and also typed the return value. It appears the convention is to duplicate types in the return signature and docstring block; let me know if you'd prefer to keep only one.

great. cc @WillAyd @jorisvandenbossche need to think about this
but prob ok for now

jreback · 2019-04-17T21:56:48Z

pandas/core/nanops.py

@@ -200,12 +200,84 @@ def _get_fill_value(dtype, fill_value=None, fill_value_typ=None):
                return tslibs.iNaT


+def _maybe_get_mask(values, skipna, mask, invert=False):


do we really need invert? this is just confusing

pls type annotate the parameters

invert was necessary in nanmedian(), but the median calculation is so slow there's minimal impact, so reverting that bit back and removing invert

jreback · 2019-04-17T21:57:01Z

pandas/core/nanops.py

+    skipna : bool
+    fill_value : Any, optional
+        value to fill NaNs with
+    fill_value_typ : str, optional


same, can you type above

jreback · 2019-04-17T21:58:23Z

pandas/core/nanops.py

-    if skipna and not is_any_int_dtype(values):
+    mask = _maybe_get_mask(values, skipna, mask)
+
+    if skipna and not is_any_int_dtype(values) and mask is not None:


do you need the int check here? (as the mask will by definition be None if we have ints)

No. The odd case of Series(dtype='bool').prod() will be slightly faster as a result of it being removed.

jreback · 2019-04-17T21:59:18Z

pandas/core/nanops.py

+        if mask is not None:
+            null_mask = mask.size - mask.sum()
+        else:
+            null_mask = functools.reduce(operator.mul, shape)


is this line covered in tests? isn't this just np.prod(shape)

Yeah, this should be getting hit by Series(dtype=int).sum() and the like. I'll confirm once codecov finishes.

jreback · 2019-04-20T17:12:36Z

pandas/core/nanops.py

@@ -200,12 +200,84 @@ def _get_fill_value(dtype, fill_value=None, fill_value_typ=None):
                return tslibs.iNaT


+def _maybe_get_mask(values, skipna, mask, invert=False):


pls type annotate the parameters

qwhelan · 2019-04-22T03:27:10Z

Updated to reflect comments - the single failing test appears to be unrelated.

jreback

can you add a whatsnew note in performance improvements
can you add some signature typing where indicated (as much as you can)
merge master; ping on green.

jreback · 2019-04-28T18:37:33Z

asv_bench/benchmarks/series_methods.py

+    def setup(self, N, dtype):
+        self.s = Series([1] * N, dtype=dtype)
+
+    def time_var(self, N, dtype):


can you update this

pandas/core/nanops.py

qwhelan · 2019-04-29T05:27:07Z

pandas/core/nanops.py

@@ -569,7 +664,7 @@ def _get_counts_nanvar(mask, axis, ddof, dtype=float):
            count = np.nan
            d = np.nan
    else:
-        mask2 = count <= ddof
+        mask2 = count <= ddof  # type: np.ndarray


mypy analysis needed an extra hint for the following line. It deduced that mask2: Union[bool, np.ndarray] and throws an error as bool does not have an any() function.

…l case

qwhelan · 2019-04-29T17:09:59Z

ping

jreback · 2019-04-30T13:17:15Z

thanks @qwhelan very nice!

qwhelan force-pushed the any_all_fix branch from 3acc6b6 to 362e702 Compare February 1, 2019 07:59

jreback requested changes Feb 1, 2019

View reviewed changes

jreback added the Performance Memory or execution speed performance label Feb 1, 2019

qwhelan mentioned this pull request Feb 19, 2019

ENH: Create boolean and integer ufuncs for isnan, isinf, and isfinite. numpy/numpy#12988

Merged

qwhelan force-pushed the any_all_fix branch 5 times, most recently from 913dd30 to 94fbe52 Compare March 22, 2019 00:56

jreback requested changes Mar 22, 2019

View reviewed changes

jreback requested changes Mar 24, 2019

View reviewed changes

jreback mentioned this pull request Apr 10, 2019

Series.all much slower than Series.values.all #26032

Closed

qwhelan force-pushed the any_all_fix branch from 94fbe52 to 01f7bc6 Compare April 16, 2019 08:32

jreback requested changes Apr 16, 2019

View reviewed changes

qwhelan force-pushed the any_all_fix branch 7 times, most recently from 3573c21 to d96b086 Compare April 17, 2019 07:08

jreback requested changes Apr 17, 2019

View reviewed changes

jreback requested changes Apr 20, 2019

View reviewed changes

qwhelan force-pushed the any_all_fix branch 6 times, most recently from 4f9c8eb to 6ac741b Compare April 22, 2019 02:27

jreback requested changes Apr 28, 2019

View reviewed changes

jreback added this to the 0.25.0 milestone Apr 28, 2019

qwhelan force-pushed the any_all_fix branch from 6ac741b to d4290a4 Compare April 29, 2019 05:23

qwhelan commented Apr 29, 2019

View reviewed changes

PERF: O(n) speedup in any/all by re-enabling short-circuiting for boo…

0387154

…l case

qwhelan force-pushed the any_all_fix branch from d4290a4 to 0387154 Compare April 29, 2019 05:29

jreback approved these changes Apr 30, 2019

View reviewed changes

jreback merged commit b6324be into pandas-dev:master Apr 30, 2019

qwhelan deleted the any_all_fix branch August 14, 2019 07:01



		def _get_mask(values, mask, skipna=True):
		if mask is None and skipna:

		@@ -200,12 +200,84 @@ def _get_fill_value(dtype, fill_value=None, fill_value_typ=None):
		return tslibs.iNaT


		def _maybe_get_mask(values, skipna, mask, invert=False):

PERF: O(n) speedup in any/all by re-enabling short-circuiting for bool case #25070

PERF: O(n) speedup in any/all by re-enabling short-circuiting for bool case #25070

Conversation

qwhelan commented Feb 1, 2019

pep8speaks commented Feb 1, 2019 • edited Loading

Comment last updated at 2019-04-29 05:29:16 UTC

codecov bot commented Feb 1, 2019

Codecov Report

codecov bot commented Feb 1, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Feb 1, 2019

jreback commented Mar 20, 2019

qwhelan commented Mar 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 5, 2019

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qwhelan commented Apr 17, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qwhelan commented Apr 22, 2019

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qwhelan commented Apr 29, 2019

jreback commented Apr 30, 2019

pep8speaks commented Feb 1, 2019 •

edited

Loading

codecov bot commented Feb 1, 2019 •

edited

Loading

qwhelan commented Apr 17, 2019 •

edited

Loading