-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Methods min and max give NaN in time-aware rolling window even if min_periods=1 #15901
Comments
you are specifying a window by an offset. So what exactly would It is essentially not implemented. I guess the docs could be better. cc @chrisaycock |
I think you actually want something like or |
The meaning of Note that In [403]: df.rolling('3d', min_periods=1)['a'].sum()
Out[403]:
2017-04-03 NaN
2017-04-04 2.0
2017-04-05 5.0
Name: a, dtype: float64
In [404]: df.rolling('3d', min_periods=2)['a'].sum()
Out[404]:
2017-04-03 NaN
2017-04-04 NaN
2017-04-05 5.0
Name: a, dtype: float64
In [405]: df.rolling('3d', min_periods=3)['a'].sum()
Out[405]:
2017-04-03 NaN
2017-04-04 NaN
2017-04-05 NaN
Name: a, dtype: float64 |
I have some questions of my own. pandas by default excludes
But then for some reason, calling numpy as a stand-alone function excludes the
And if I try their version that explicitly excludes
So it seems there are lots of unexpected results here. |
@chrisaycock Concerning your first question,
for a fixed width rolling window (specified by an integer), the default value for the parameter These are equivalent: In [406]: df.rolling(3)['a'].min()
Out[406]:
2017-04-03 NaN
2017-04-04 NaN
2017-04-05 NaN
Name: a, dtype: float64
In [407]: df.rolling(3, min_periods=3)['a'].min()
Out[407]:
2017-04-03 NaN
2017-04-04 NaN
2017-04-05 NaN
Name: a, dtype: float64 |
@chrisaycock For the other questions, you are passing a Pandas If you use the Pandas Series attribute In [23]: np.min(df.a.values)
Out[23]: nan
In [24]: np.nanmin(df.a.values)
Out[24]: 2.0 Nevertheless, I think this is a digression with respect to the original issue: Pandas |
oh so this works for the numeric ones just not min. max with an offset? if that is the case it is a bug |
@jreback This is the output for other functions: In [30]: df.rolling('3d')['a'].sum()
Out[30]:
2017-04-03 NaN
2017-04-04 2.0
2017-04-05 5.0
Name: a, dtype: float64
In [31]: df.rolling('3d')['a'].count()
Out[31]:
2017-04-03 NaN
2017-04-04 1.0
2017-04-05 2.0
Name: a, dtype: float64
In [32]: df.rolling('3d')['a'].mean()
Out[32]:
2017-04-03 NaN
2017-04-04 2.0
2017-04-05 2.5
Name: a, dtype: float64
In [33]: df.rolling('3d')['a'].median()
Out[33]:
2017-04-03 NaN
2017-04-04 2.0
2017-04-05 2.5
Name: a, dtype: float64 whereas this is the output for the functions In [34]: df.rolling('3d')['a'].min()
Out[34]:
2017-04-03 NaN
2017-04-04 NaN
2017-04-05 NaN
Name: a, dtype: float64
In [35]: df.rolling('3d')['a'].max()
Out[35]:
2017-04-03 NaN
2017-04-04 NaN
2017-04-05 NaN
Name: a, dtype: float64 |
It is maybe not due to the min_periods, but rather the min/max function implementation, as doing it with an apply, you get the expected result:
But I agree with @albertvillanova, this is certainly a bug.
That is because if you do |
My point was the inconsistent Regarding this particular issue, yes, |
ok if someone wants to take a crack at this, have at it. |
Was it fixed? |
@jreback this is working fine in 0.24.x |
@ihsansecer how is this on master? if working so we have a test for this? |
thanks for checking @ihsansecer |
Code Sample, a copy-pastable example if possible
Problem description
Even if we set
min_periods=1
, the functionsmin
andmax
give NaN if there is one NaN value inside the time-aware rolling window.However, there is no bug when the window width is fixed (not a time period):
Expected Output
The expected output, analogously to the one given by the function
sum
, should be a non-NaN value if at least there is a non-NaN value inside the rolling window.Output of
pd.show_versions()
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.8.1
boto: 2.45.0
pandas_datareader: None
The text was updated successfully, but these errors were encountered: