-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: min_weight
in addition to min_periods
for ewma
#11167
Comments
what would you have the result look like given an some |
Here's a bit of test code that does what I think this should do: def filter_min_weights(series, min_weight):
ewma_series = pd.ewma(series, span=10, min_periods=0)
has_weight = pd.Series(0, index=series.index)
has_weight[series.dropna()] = 1
weights = pd.ewma(has_weight, span=10, min_periods=0)
has_sufficient_weight = weights[weights>min_weight]
return ewma_series.where(has_sufficient_weight) Then: In [128]: partial_series = pd.Series(range(200))
In [129]: partial_series[20:185]=pd.np.nan
In [130]: filter_min_weights(partial_series, min_weight=0.5)
Out[130]:
0 0.000000
1 0.550000
2 1.132890
3 1.748020
4 2.394502
5 3.071240
6 3.776953
7 4.510212
8 5.269468
9 6.053089
10 6.859394
11 7.686679
12 8.533251
13 9.397448
14 10.277660
15 11.172348
16 12.080052
17 12.999407
18 13.929141
19 14.868084
20 14.868084
21 14.868084
22 14.868084
23 NaN
24 NaN
25 NaN
26 NaN
27 NaN
28 NaN
29 NaN
...
170 NaN
171 NaN
172 NaN
173 NaN
174 NaN
175 NaN
176 NaN
177 NaN
178 NaN
179 NaN
180 NaN
181 NaN
182 NaN
183 NaN
184 NaN
185 NaN
186 NaN
187 NaN
188 186.748020
189 187.394502
190 188.071240
191 188.776953
192 189.510212
193 190.269468
194 191.053089
195 191.859394
196 192.686679
197 193.533251
198 194.397448
199 195.277660
dtype: float64 For clarity the Out[134]:
0 100
1 100
2 100
3 100
4 100
5 100
6 100
7 100
8 100
9 100
10 100
11 100
12 100
13 100
14 100
15 100
16 100
17 100
18 100
19 100
20 81
21 66
22 54
23 44
24 36
25 29
26 24
27 19
28 16
29 13
...
170 0
171 0
172 0
173 0
174 0
175 0
176 0
177 0
178 0
179 0
180 0
181 0
182 0
183 0
184 0
185 18
186 33
187 45
188 55
189 63
190 70
191 75
192 79
193 83
194 86
195 89
196 91
197 92
198 93
199 95
dtype: int64 (apologies for the slow reply) |
Anyone have any thoughts here? (do you know who the pandas experts on this stuff are @jreback?) I think this is a better way, with some confidence and - for once - this is my area of expertise outside of pandas. But I think it only makes sense to build this if there's some consensus. (and after #11603) Or let me know if my example is unclear / there are any Qs |
cc @seth-p |
I think this is a good idea, though probably makes sense only when Obviously for backwards compatibility I would keep |
Currently the exponential functions, such as
pd.ewma
, use amin_periods
argument to ensure there's enough data to specify generate a valid value. While this works well for the rolling functions, it's not effective for exponential functions because points have weight forever, albeit ever decreasing:I think what we want is to have a
min_weight
argument, so if you specify 0.5, it needs 50% of the weight in order to calculate a value. For rolling functions, this would be equivalent tomin_periods
being half ofwindow
.What are people's thoughts?
The text was updated successfully, but these errors were encountered: