Warning
Starting with the 0.25.x series of releases, pandas only supports Python 3.5.3 and higher. See Dropping Python 2.7 for more details.
Warning
The minimum supported Python version will be bumped to 3.6 in a future release.
Warning
Panel
has been fully removed. For N-D labeled data structures, please
use xarray
Warning
:func:`read_pickle` and :func:`read_msgpack` are only guaranteed backwards compatible back to pandas version 0.20.3 (:issue:`27082`)
{{ header }}
These are the changes in pandas 0.25.0. See :ref:`release` for a full changelog including other versions of pandas.
pandas has added special groupby behavior, known as "named aggregation", for naming the output columns when applying multiple aggregation functions to specific columns (:issue:`18366`, :issue:`26512`).
.. ipython:: python animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'], 'height': [9.1, 6.0, 9.5, 34.0], 'weight': [7.9, 7.5, 9.9, 198.0]}) animals animals.groupby("kind").agg( min_height=pd.NamedAgg(column='height', aggfunc='min'), max_height=pd.NamedAgg(column='height', aggfunc='max'), average_weight=pd.NamedAgg(column='weight', aggfunc="mean"), )
Pass the desired columns names as the **kwargs
to .agg
. The values of **kwargs
should be tuples where the first element is the column selection, and the second element is the
aggregation function to apply. pandas provides the pandas.NamedAgg
namedtuple to make it clearer
what the arguments to the function are, but plain tuples are accepted as well.
.. ipython:: python animals.groupby("kind").agg( min_height=('height', 'min'), max_height=('height', 'max'), average_weight=('weight', 'mean'), )
Named aggregation is the recommended replacement for the deprecated "dict-of-dicts" approach to naming the output of column-specific aggregations (:ref:`whatsnew_0200.api_breaking.deprecate_group_agg_dict`).
A similar approach is now available for Series groupby objects as well. Because there's no need for column selection, the values can just be the functions to apply
.. ipython:: python animals.groupby("kind").height.agg( min_height="min", max_height="max", )
This type of aggregation is the recommended alternative to the deprecated behavior when passing a dict to a Series groupby aggregation (:ref:`whatsnew_0200.api_breaking.deprecate_group_agg_dict`).
See :ref:`groupby.aggregate.named` for more.
You can now provide multiple lambda functions to a list-like aggregation in :class:`.GroupBy.agg` (:issue:`26430`).
.. ipython:: python animals.groupby('kind').height.agg([ lambda x: x.iloc[0], lambda x: x.iloc[-1] ]) animals.groupby('kind').agg([ lambda x: x.iloc[0] - x.iloc[1], lambda x: x.iloc[0] + x.iloc[1] ])
Previously, these raised a SpecificationError
.
Printing of :class:`MultiIndex` instances now shows tuples of each row and ensures
that the tuple items are vertically aligned, so it's now easier to understand
the structure of the MultiIndex
. (:issue:`13480`):
The repr now looks like this:
.. ipython:: python pd.MultiIndex.from_product([['a', 'abc'], range(500)])
Previously, outputting a :class:`MultiIndex` printed all the levels
and
codes
of the MultiIndex
, which was visually unappealing and made
the output more difficult to navigate. For example (limiting the range to 5):
In [1]: pd.MultiIndex.from_product([['a', 'abc'], range(5)])
Out[1]: MultiIndex(levels=[['a', 'abc'], [0, 1, 2, 3]],
...: codes=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 1, 2, 3, 0, 1, 2, 3]])
In the new repr, all values will be shown, if the number of rows is smaller than :attr:`options.display.max_seq_items` (default: 100 items). Horizontally, the output will truncate, if it's wider than :attr:`options.display.width` (default: 80 characters).
Currently, the default display options of pandas ensure that when a Series
or DataFrame has more than 60 rows, its repr gets truncated to this maximum
of 60 rows (the display.max_rows
option). However, this still gives
a repr that takes up a large part of the vertical screen estate. Therefore,
a new option display.min_rows
is introduced with a default of 10 which
determines the number of rows showed in the truncated repr:
- For small Series or DataFrames, up to
max_rows
number of rows is shown (default: 60). - For larger Series of DataFrame with a length above
max_rows
, onlymin_rows
number of rows is shown (default: 10, i.e. the first and last 5 rows).
This dual option allows to still see the full content of relatively small
objects (e.g. df.head(20)
shows all 20 rows), while giving a brief repr
for large objects.
To restore the previous behaviour of a single threshold, set
pd.options.display.min_rows = None
.
:func:`json_normalize` normalizes the provided input dict to all nested levels. The new max_level parameter provides more control over which level to end normalization (:issue:`23843`):
The repr now looks like this:
from pandas.io.json import json_normalize
data = [{
'CreatedBy': {'Name': 'User001'},
'Lookup': {'TextField': 'Some text',
'UserField': {'Id': 'ID001', 'Name': 'Name001'}},
'Image': {'a': 'b'}
}]
json_normalize(data, max_level=1)
:class:`Series` and :class:`DataFrame` have gained the :meth:`DataFrame.explode` methods to transform list-likes to individual rows. See :ref:`section on Exploding list-like column <reshaping.explode>` in docs for more information (:issue:`16538`, :issue:`10511`)
Here is a typical usecase. You have comma separated string in a column.
.. ipython:: python df = pd.DataFrame([{'var1': 'a,b,c', 'var2': 1}, {'var1': 'd,e,f', 'var2': 2}]) df
Creating a long form DataFrame
is now straightforward using chained operations
.. ipython:: python df.assign(var1=df.var1.str.split(',')).explode('var1')
- :func:`DataFrame.plot` keywords
logy
,logx
andloglog
can now accept the value'sym'
for symlog scaling. (:issue:`24867`) - Added support for ISO week year format ('%G-%V-%u') when parsing datetimes using :meth:`to_datetime` (:issue:`16607`)
- Indexing of
DataFrame
andSeries
now accepts zerodimnp.ndarray
(:issue:`24919`) - :meth:`Timestamp.replace` now supports the
fold
argument to disambiguate DST transition times (:issue:`25017`) - :meth:`DataFrame.at_time` and :meth:`Series.at_time` now support :class:`datetime.time` objects with timezones (:issue:`24043`)
- :meth:`DataFrame.pivot_table` now accepts an
observed
parameter which is passed to underlying calls to :meth:`DataFrame.groupby` to speed up grouping categorical data. (:issue:`24923`) Series.str
has gained :meth:`Series.str.casefold` method to removes all case distinctions present in a string (:issue:`25405`)- :meth:`DataFrame.set_index` now works for instances of
abc.Iterator
, provided their output is of the same length as the calling frame (:issue:`22484`, :issue:`24984`) - :meth:`DatetimeIndex.union` now supports the
sort
argument. The behavior of the sort parameter matches that of :meth:`Index.union` (:issue:`24994`) - :meth:`RangeIndex.union` now supports the
sort
argument. Ifsort=False
an unsortedInt64Index
is always returned.sort=None
is the default and returns a monotonically increasingRangeIndex
if possible or a sortedInt64Index
if not (:issue:`24471`) - :meth:`TimedeltaIndex.intersection` now also supports the
sort
keyword (:issue:`24471`) - :meth:`DataFrame.rename` now supports the
errors
argument to raise errors when attempting to rename nonexistent keys (:issue:`13473`) - Added :ref:`api.frame.sparse` for working with a
DataFrame
whose values are sparse (:issue:`25681`) - :class:`RangeIndex` has gained :attr:`~RangeIndex.start`, :attr:`~RangeIndex.stop`, and :attr:`~RangeIndex.step` attributes (:issue:`25710`)
- :class:`datetime.timezone` objects are now supported as arguments to timezone methods and constructors (:issue:`25065`)
- :meth:`DataFrame.query` and :meth:`DataFrame.eval` now supports quoting column names with backticks to refer to names with spaces (:issue:`6508`)
- :func:`merge_asof` now gives a more clear error message when merge keys are categoricals that are not equal (:issue:`26136`)
- :meth:`.Rolling` supports exponential (or Poisson) window type (:issue:`21303`)
- Error message for missing required imports now includes the original import error's text (:issue:`23868`)
- :class:`DatetimeIndex` and :class:`TimedeltaIndex` now have a
mean
method (:issue:`24757`) - :meth:`DataFrame.describe` now formats integer percentiles without decimal point (:issue:`26660`)
- Added support for reading SPSS .sav files using :func:`read_spss` (:issue:`26537`)
- Added new option
plotting.backend
to be able to select a plotting backend different than the existingmatplotlib
one. Usepandas.set_option('plotting.backend', '<backend-module>')
where<backend-module
is a library implementing the pandas plotting API (:issue:`14130`) - :class:`pandas.offsets.BusinessHour` supports multiple opening hours intervals (:issue:`15481`)
- :func:`read_excel` can now use
openpyxl
to read Excel files via theengine='openpyxl'
argument. This will become the default in a future release (:issue:`11499`) - :func:`pandas.io.excel.read_excel` supports reading OpenDocument tables. Specify
engine='odf'
to enable. Consult the :ref:`IO User Guide <io.ods>` for more details (:issue:`9070`) - :class:`Interval`, :class:`IntervalIndex`, and :class:`~arrays.IntervalArray` have gained an :attr:`~Interval.is_empty` attribute denoting if the given interval(s) are empty (:issue:`27219`)
Indexing a :class:`DataFrame` or :class:`Series` with a :class:`DatetimeIndex` with a date string with a UTC offset would previously ignore the UTC offset. Now, the UTC offset is respected in indexing. (:issue:`24076`, :issue:`16785`)
.. ipython:: python df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific')) df
Previous behavior:
In [3]: df['2019-01-01 00:00:00+04:00':'2019-01-01 01:00:00+04:00']
Out[3]:
0
2019-01-01 00:00:00-08:00 0
New behavior:
.. ipython:: python df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00']
Constructing a :class:`MultiIndex` with NaN
levels or codes value < -1 was allowed previously.
Now, construction with codes value < -1 is not allowed and NaN
levels' corresponding codes
would be reassigned as -1. (:issue:`19387`)
Previous behavior:
In [1]: pd.MultiIndex(levels=[[np.nan, None, pd.NaT, 128, 2]],
...: codes=[[0, -1, 1, 2, 3, 4]])
...:
Out[1]: MultiIndex(levels=[[nan, None, NaT, 128, 2]],
codes=[[0, -1, 1, 2, 3, 4]])
In [2]: pd.MultiIndex(levels=[[1, 2]], codes=[[0, -2]])
Out[2]: MultiIndex(levels=[[1, 2]],
codes=[[0, -2]])
New behavior:
.. ipython:: python :okexcept: pd.MultiIndex(levels=[[np.nan, None, pd.NaT, 128, 2]], codes=[[0, -1, 1, 2, 3, 4]]) pd.MultiIndex(levels=[[1, 2]], codes=[[0, -2]])
The implementation of :meth:`.DataFrameGroupBy.apply` previously evaluated the supplied function consistently twice on the first group to infer if it is safe to use a fast code path. Particularly for functions with side effects, this was an undesired behavior and may have led to surprises. (:issue:`2936`, :issue:`2656`, :issue:`7739`, :issue:`10519`, :issue:`12155`, :issue:`20084`, :issue:`21417`)
Now every group is evaluated only a single time.
.. ipython:: python df = pd.DataFrame({"a": ["x", "y"], "b": [1, 2]}) df def func(group): print(group.name) return group
Previous behavior:
In [3]: df.groupby('a').apply(func)
x
x
y
Out[3]:
a b
0 x 1
1 y 2
New behavior:
In [3]: df.groupby('a').apply(func)
x
y
Out[3]:
a b
0 x 1
1 y 2
When passed DataFrames whose values are sparse, :func:`concat` will now return a :class:`Series` or :class:`DataFrame` with sparse values, rather than a :class:`SparseDataFrame` (:issue:`25702`).
.. ipython:: python df = pd.DataFrame({"A": pd.arrays.SparseArray([0, 1])})
Previous behavior:
In [2]: type(pd.concat([df, df]))
pandas.core.sparse.frame.SparseDataFrame
New behavior:
.. ipython:: python type(pd.concat([df, df]))
This now matches the existing behavior of :class:`concat` on Series
with sparse values.
:func:`concat` will continue to return a SparseDataFrame
when all the values
are instances of SparseDataFrame
.
This change also affects routines using :func:`concat` internally, like :func:`get_dummies`,
which now returns a :class:`DataFrame` in all cases (previously a SparseDataFrame
was
returned if all the columns were dummy encoded, and a :class:`DataFrame` otherwise).
Providing any SparseSeries
or SparseDataFrame
to :func:`concat` will
cause a SparseSeries
or SparseDataFrame
to be returned, as before.
Due to the lack of more fine-grained dtypes, :attr:`Series.str` so far only checked whether the data was
of object
dtype. :attr:`Series.str` will now infer the dtype data within the Series; in particular,
'bytes'
-only data will raise an exception (except for :meth:`Series.str.decode`, :meth:`Series.str.get`,
:meth:`Series.str.len`, :meth:`Series.str.slice`), see :issue:`23163`, :issue:`23011`, :issue:`23551`.
Previous behavior:
In [1]: s = pd.Series(np.array(['a', 'ba', 'cba'], 'S'), dtype=object)
In [2]: s
Out[2]:
0 b'a'
1 b'ba'
2 b'cba'
dtype: object
In [3]: s.str.startswith(b'a')
Out[3]:
0 True
1 False
2 False
dtype: bool
New behavior:
.. ipython:: python :okexcept: s = pd.Series(np.array(['a', 'ba', 'cba'], 'S'), dtype=object) s s.str.startswith(b'a')
Previously, columns that were categorical, but not the groupby key(s) would be converted to object
dtype during groupby operations. pandas now will preserve these dtypes. (:issue:`18502`)
.. ipython:: python cat = pd.Categorical(["foo", "bar", "bar", "qux"], ordered=True) df = pd.DataFrame({'payload': [-1, -2, -1, -2], 'col': cat}) df df.dtypes
Previous Behavior:
In [5]: df.groupby('payload').first().col.dtype
Out[5]: dtype('O')
New Behavior:
.. ipython:: python df.groupby('payload').first().col.dtype
When performing :func:`Index.union` operations between objects of incompatible dtypes,
the result will be a base :class:`Index` of dtype object
. This behavior holds true for
unions between :class:`Index` objects that previously would have been prohibited. The dtype
of empty :class:`Index` objects will now be evaluated before performing union operations
rather than simply returning the other :class:`Index` object. :func:`Index.union` can now be
considered commutative, such that A.union(B) == B.union(A)
(:issue:`23525`).
Previous behavior:
In [1]: pd.period_range('19910905', periods=2).union(pd.Int64Index([1, 2, 3]))
...
ValueError: can only call with other PeriodIndex-ed objects
In [2]: pd.Index([], dtype=object).union(pd.Index([1, 2, 3]))
Out[2]: Int64Index([1, 2, 3], dtype='int64')
New behavior:
In [3]: pd.period_range('19910905', periods=2).union(pd.Int64Index([1, 2, 3]))
Out[3]: Index([1991-09-05, 1991-09-06, 1, 2, 3], dtype='object')
In [4]: pd.Index([], dtype=object).union(pd.Index([1, 2, 3]))
Out[4]: Index([1, 2, 3], dtype='object')
Note that integer- and floating-dtype indexes are considered "compatible". The integer values are coerced to floating point, which may result in loss of precision. See :ref:`indexing.set_ops` for more.
The methods ffill
, bfill
, pad
and backfill
of
:class:`.DataFrameGroupBy`
previously included the group labels in the return value, which was
inconsistent with other groupby transforms. Now only the filled values
are returned. (:issue:`21521`)
.. ipython:: python df = pd.DataFrame({"a": ["x", "y"], "b": [1, 2]}) df
Previous behavior:
In [3]: df.groupby("a").ffill()
Out[3]:
a b
0 x 1
1 y 2
New behavior:
.. ipython:: python df.groupby("a").ffill()
When calling :meth:`DataFrame.describe` with an empty categorical / object column, the 'top' and 'freq' columns were previously omitted, which was inconsistent with the output for non-empty columns. Now the 'top' and 'freq' columns will always be included, with :attr:`numpy.nan` in the case of an empty :class:`DataFrame` (:issue:`26397`)
.. ipython:: python df = pd.DataFrame({"empty_col": pd.Categorical([])}) df
Previous behavior:
In [3]: df.describe()
Out[3]:
empty_col
count 0
unique 0
New behavior:
.. ipython:: python df.describe()
pandas has until now mostly defined string representations in a pandas objects'
__str__
/__unicode__
/__bytes__
methods, and called __str__
from the __repr__
method, if a specific __repr__
method is not found. This is not needed for Python3.
In pandas 0.25, the string representations of pandas objects are now generally
defined in __repr__
, and calls to __str__
in general now pass the call on to
the __repr__
, if a specific __str__
method doesn't exist, as is standard for Python.
This change is backward compatible for direct usage of pandas, but if you subclass
pandas objects and give your subclasses specific __str__
/__repr__
methods,
you may have to adjust your __str__
/__repr__
methods (:issue:`26495`).
Indexing methods for :class:`IntervalIndex` have been modified to require exact matches only for :class:`Interval` queries.
IntervalIndex
methods previously matched on any overlapping Interval
. Behavior with scalar points, e.g. querying
with an integer, is unchanged (:issue:`16316`).
.. ipython:: python ii = pd.IntervalIndex.from_tuples([(0, 4), (1, 5), (5, 8)]) ii
The in
operator (__contains__
) now only returns True
for exact matches to Intervals
in the IntervalIndex
, whereas
this would previously return True
for any Interval
overlapping an Interval
in the IntervalIndex
.
Previous behavior:
In [4]: pd.Interval(1, 2, closed='neither') in ii
Out[4]: True
In [5]: pd.Interval(-10, 10, closed='both') in ii
Out[5]: True
New behavior:
.. ipython:: python pd.Interval(1, 2, closed='neither') in ii pd.Interval(-10, 10, closed='both') in ii
The :meth:`~IntervalIndex.get_loc` method now only returns locations for exact matches to Interval
queries, as opposed to the previous behavior of
returning locations for overlapping matches. A KeyError
will be raised if an exact match is not found.
Previous behavior:
In [6]: ii.get_loc(pd.Interval(1, 5))
Out[6]: array([0, 1])
In [7]: ii.get_loc(pd.Interval(2, 6))
Out[7]: array([0, 1, 2])
New behavior:
In [6]: ii.get_loc(pd.Interval(1, 5))
Out[6]: 1
In [7]: ii.get_loc(pd.Interval(2, 6))
---------------------------------------------------------------------------
KeyError: Interval(2, 6, closed='right')
Likewise, :meth:`~IntervalIndex.get_indexer` and :meth:`~IntervalIndex.get_indexer_non_unique` will also only return locations for exact matches
to Interval
queries, with -1
denoting that an exact match was not found.
These indexing changes extend to querying a :class:`Series` or :class:`DataFrame` with an IntervalIndex
index.
.. ipython:: python s = pd.Series(list('abc'), index=ii) s
Selecting from a Series
or DataFrame
using []
(__getitem__
) or loc
now only returns exact matches for Interval
queries.
Previous behavior:
In [8]: s[pd.Interval(1, 5)]
Out[8]:
(0, 4] a
(1, 5] b
dtype: object
In [9]: s.loc[pd.Interval(1, 5)]
Out[9]:
(0, 4] a
(1, 5] b
dtype: object
New behavior:
.. ipython:: python s[pd.Interval(1, 5)] s.loc[pd.Interval(1, 5)]
Similarly, a KeyError
will be raised for non-exact matches instead of returning overlapping matches.
Previous behavior:
In [9]: s[pd.Interval(2, 3)]
Out[9]:
(0, 4] a
(1, 5] b
dtype: object
In [10]: s.loc[pd.Interval(2, 3)]
Out[10]:
(0, 4] a
(1, 5] b
dtype: object
New behavior:
In [6]: s[pd.Interval(2, 3)]
---------------------------------------------------------------------------
KeyError: Interval(2, 3, closed='right')
In [7]: s.loc[pd.Interval(2, 3)]
---------------------------------------------------------------------------
KeyError: Interval(2, 3, closed='right')
The :meth:`~IntervalIndex.overlaps` method can be used to create a boolean indexer that replicates the previous behavior of returning overlapping matches.
New behavior:
.. ipython:: python idxr = s.index.overlaps(pd.Interval(2, 3)) idxr s[idxr] s.loc[idxr]
Applying a binary ufunc like :func:`numpy.power` now aligns the inputs when both are :class:`Series` (:issue:`23293`).
.. ipython:: python s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c']) s2 = pd.Series([3, 4, 5], index=['d', 'c', 'b']) s1 s2
Previous behavior
In [5]: np.power(s1, s2)
Out[5]:
a 1
b 16
c 243
dtype: int64
New behavior
.. ipython:: python np.power(s1, s2)
This matches the behavior of other binary operations in pandas, like :meth:`Series.add`.
To retain the previous behavior, convert the other Series
to an array before
applying the ufunc.
.. ipython:: python np.power(s1, s2.array)
:meth:`Categorical.argsort` now places missing values at the end of the array, making it consistent with NumPy and the rest of pandas (:issue:`21801`).
.. ipython:: python cat = pd.Categorical(['b', None, 'a'], categories=['a', 'b'], ordered=True)
Previous behavior
In [2]: cat = pd.Categorical(['b', None, 'a'], categories=['a', 'b'], ordered=True)
In [3]: cat.argsort()
Out[3]: array([1, 2, 0])
In [4]: cat[cat.argsort()]
Out[4]:
[NaN, a, b]
categories (2, object): [a < b]
New behavior
.. ipython:: python cat.argsort() cat[cat.argsort()]
Starting with Python 3.7 the key-order of dict
is guaranteed. In practice, this has been true since
Python 3.6. The :class:`DataFrame` constructor now treats a list of dicts in the same way as
it does a list of OrderedDict
, i.e. preserving the order of the dicts.
This change applies only when pandas is running on Python>=3.6 (:issue:`27309`).
.. ipython:: python data = [ {'name': 'Joe', 'state': 'NY', 'age': 18}, {'name': 'Jane', 'state': 'KY', 'age': 19, 'hobby': 'Minecraft'}, {'name': 'Jean', 'state': 'OK', 'age': 20, 'finances': 'good'} ]
Previous Behavior:
The columns were lexicographically sorted previously,
In [1]: pd.DataFrame(data)
Out[1]:
age finances hobby name state
0 18 NaN NaN Joe NY
1 19 NaN Minecraft Jane KY
2 20 good NaN Jean OK
New Behavior:
The column order now matches the insertion-order of the keys in the dict
,
considering all the records from top to bottom. As a consequence, the column
order of the resulting DataFrame has changed compared to previous pandas versions.
.. ipython:: python pd.DataFrame(data)
Due to dropping support for Python 2.7, a number of optional dependencies have updated minimum versions (:issue:`25725`, :issue:`24942`, :issue:`25752`). Independently, some minimum supported versions of dependencies were updated (:issue:`23519`, :issue:`25554`). If installed, we now require:
Package | Minimum Version | Required |
---|---|---|
numpy | 1.13.3 | X |
pytz | 2015.4 | X |
python-dateutil | 2.6.1 | X |
bottleneck | 1.2.1 | |
numexpr | 2.6.2 | |
pytest (dev) | 4.0.2 |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package | Minimum Version |
---|---|
beautifulsoup4 | 4.6.0 |
fastparquet | 0.2.1 |
gcsfs | 0.2.2 |
lxml | 3.8.0 |
matplotlib | 2.2.2 |
openpyxl | 2.4.8 |
pyarrow | 0.9.0 |
pymysql | 0.7.1 |
pytables | 3.4.2 |
scipy | 0.19.0 |
sqlalchemy | 1.1.4 |
xarray | 0.8.2 |
xlrd | 1.1.0 |
xlsxwriter | 0.9.8 |
xlwt | 1.2.0 |
See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.
- :class:`DatetimeTZDtype` will now standardize pytz timezones to a common timezone instance (:issue:`24713`)
- :class:`Timestamp` and :class:`Timedelta` scalars now implement the :meth:`to_numpy` method as aliases to :meth:`Timestamp.to_datetime64` and :meth:`Timedelta.to_timedelta64`, respectively. (:issue:`24653`)
- :meth:`Timestamp.strptime` will now rise a
NotImplementedError
(:issue:`25016`) - Comparing :class:`Timestamp` with unsupported objects now returns :py:obj:`NotImplemented` instead of raising
TypeError
. This implies that unsupported rich comparisons are delegated to the other object, and are now consistent with Python 3 behavior fordatetime
objects (:issue:`24011`) - Bug in :meth:`DatetimeIndex.snap` which didn't preserving the
name
of the input :class:`Index` (:issue:`25575`) - The
arg
argument in :meth:`.DataFrameGroupBy.agg` has been renamed tofunc
(:issue:`26089`) - The
arg
argument in :meth:`.Window.aggregate` has been renamed tofunc
(:issue:`26372`) - Most pandas classes had a
__bytes__
method, which was used for getting a python2-style bytestring representation of the object. This method has been removed as a part of dropping Python2 (:issue:`26447`) - The
.str
-accessor has been disabled for 1-level :class:`MultiIndex`, use :meth:`MultiIndex.to_flat_index` if necessary (:issue:`23679`) - Removed support of gtk package for clipboards (:issue:`26563`)
- Using an unsupported version of Beautiful Soup 4 will now raise an
ImportError
instead of aValueError
(:issue:`27063`) - :meth:`Series.to_excel` and :meth:`DataFrame.to_excel` will now raise a
ValueError
when saving timezone aware data. (:issue:`27008`, :issue:`7056`) - :meth:`ExtensionArray.argsort` places NA values at the end of the sorted array. (:issue:`21801`)
- :meth:`DataFrame.to_hdf` and :meth:`Series.to_hdf` will now raise a
NotImplementedError
when saving a :class:`MultiIndex` with extension data types for afixed
format. (:issue:`7775`) - Passing duplicate
names
in :meth:`read_csv` will now raise aValueError
(:issue:`17346`)
The SparseSeries
and SparseDataFrame
subclasses are deprecated. Their functionality is better-provided
by a Series
or DataFrame
with sparse values.
Previous way
df = pd.SparseDataFrame({"A": [0, 0, 1, 2]})
df.dtypes
New way
.. ipython:: python df = pd.DataFrame({"A": pd.arrays.SparseArray([0, 0, 1, 2])}) df.dtypes
The memory usage of the two approaches is identical (:issue:`19239`).
The msgpack format is deprecated as of 0.25 and will be removed in a future version. It is recommended to use pyarrow for on-the-wire transmission of pandas objects. (:issue:`27084`)
- The deprecated
.ix[]
indexer now raises a more visibleFutureWarning
instead ofDeprecationWarning
(:issue:`26438`). - Deprecated the
units=M
(months) andunits=Y
(year) parameters forunits
of :func:`pandas.to_timedelta`, :func:`pandas.Timedelta` and :func:`pandas.TimedeltaIndex` (:issue:`16344`) - :meth:`pandas.concat` has deprecated the
join_axes
-keyword. Instead, use :meth:`DataFrame.reindex` or :meth:`DataFrame.reindex_like` on the result or on the inputs (:issue:`21951`) - The :attr:`SparseArray.values` attribute is deprecated. You can use
np.asarray(...)
or the :meth:`SparseArray.to_dense` method instead (:issue:`26421`). - The functions :func:`pandas.to_datetime` and :func:`pandas.to_timedelta` have deprecated the
box
keyword. Instead, use :meth:`to_numpy` or :meth:`Timestamp.to_datetime64` or :meth:`Timedelta.to_timedelta64`. (:issue:`24416`) - The :meth:`DataFrame.compound` and :meth:`Series.compound` methods are deprecated and will be removed in a future version (:issue:`26405`).
- The internal attributes
_start
,_stop
and_step
attributes of :class:`RangeIndex` have been deprecated. Use the public attributes :attr:`~RangeIndex.start`, :attr:`~RangeIndex.stop` and :attr:`~RangeIndex.step` instead (:issue:`26581`). - The :meth:`Series.ftype`, :meth:`Series.ftypes` and :meth:`DataFrame.ftypes` methods are deprecated and will be removed in a future version. Instead, use :meth:`Series.dtype` and :meth:`DataFrame.dtypes` (:issue:`26705`).
- The :meth:`Series.get_values`, :meth:`DataFrame.get_values`, :meth:`Index.get_values`,
:meth:`SparseArray.get_values` and :meth:`Categorical.get_values` methods are deprecated.
One of
np.asarray(..)
or :meth:`~Series.to_numpy` can be used instead (:issue:`19617`). - The 'outer' method on NumPy ufuncs, e.g.
np.subtract.outer
has been deprecated on :class:`Series` objects. Convert the input to an array with :attr:`Series.array` first (:issue:`27186`) - :meth:`Timedelta.resolution` is deprecated and replaced with :meth:`Timedelta.resolution_string`. In a future version, :meth:`Timedelta.resolution` will be changed to behave like the standard library :attr:`datetime.timedelta.resolution` (:issue:`21344`)
- :func:`read_table` has been undeprecated. (:issue:`25220`)
- :attr:`Index.dtype_str` is deprecated. (:issue:`18262`)
- :attr:`Series.imag` and :attr:`Series.real` are deprecated. (:issue:`18262`)
- :meth:`Series.put` is deprecated. (:issue:`18262`)
- :meth:`Index.item` and :meth:`Series.item` is deprecated. (:issue:`18262`)
- The default value
ordered=None
in :class:`~pandas.api.types.CategoricalDtype` has been deprecated in favor ofordered=False
. When converting between categorical typesordered=True
must be explicitly passed in order to be preserved. (:issue:`26336`) - :meth:`Index.contains` is deprecated. Use
key in index
(__contains__
) instead (:issue:`17753`). - :meth:`DataFrame.get_dtype_counts` is deprecated. (:issue:`18262`)
- :meth:`Categorical.ravel` will return a :class:`Categorical` instead of a
np.ndarray
(:issue:`27199`)
- Removed
Panel
(:issue:`25047`, :issue:`25191`, :issue:`25231`) - Removed the previously deprecated
sheetname
keyword in :func:`read_excel` (:issue:`16442`, :issue:`20938`) - Removed the previously deprecated
TimeGrouper
(:issue:`16942`) - Removed the previously deprecated
parse_cols
keyword in :func:`read_excel` (:issue:`16488`) - Removed the previously deprecated
pd.options.html.border
(:issue:`16970`) - Removed the previously deprecated
convert_objects
(:issue:`11221`) - Removed the previously deprecated
select
method ofDataFrame
andSeries
(:issue:`17633`) - Removed the previously deprecated behavior of :class:`Series` treated as list-like in :meth:`~Series.cat.rename_categories` (:issue:`17982`)
- Removed the previously deprecated
DataFrame.reindex_axis
andSeries.reindex_axis
(:issue:`17842`) - Removed the previously deprecated behavior of altering column or index labels with :meth:`Series.rename_axis` or :meth:`DataFrame.rename_axis` (:issue:`17842`)
- Removed the previously deprecated
tupleize_cols
keyword argument in :meth:`read_html`, :meth:`read_csv`, and :meth:`DataFrame.to_csv` (:issue:`17877`, :issue:`17820`) - Removed the previously deprecated
DataFrame.from.csv
andSeries.from_csv
(:issue:`17812`) - Removed the previously deprecated
raise_on_error
keyword argument in :meth:`DataFrame.where` and :meth:`DataFrame.mask` (:issue:`17744`) - Removed the previously deprecated
ordered
andcategories
keyword arguments inastype
(:issue:`17742`) - Removed the previously deprecated
cdate_range
(:issue:`17691`) - Removed the previously deprecated
True
option for thedropna
keyword argument in :func:`SeriesGroupBy.nth` (:issue:`17493`) - Removed the previously deprecated
convert
keyword argument in :meth:`Series.take` and :meth:`DataFrame.take` (:issue:`17352`) - Removed the previously deprecated behavior of arithmetic operations with
datetime.date
objects (:issue:`21152`)
- Significant speedup in :class:`SparseArray` initialization that benefits most operations, fixing performance regression introduced in v0.20.0 (:issue:`24985`)
- :meth:`DataFrame.to_stata` is now faster when outputting data with any string or non-native endian columns (:issue:`25045`)
- Improved performance of :meth:`Series.searchsorted`. The speedup is especially large when the dtype is int8/int16/int32 and the searched key is within the integer bounds for the dtype (:issue:`22034`)
- Improved performance of :meth:`.GroupBy.quantile` (:issue:`20405`)
- Improved performance of slicing and other selected operation on a :class:`RangeIndex` (:issue:`26565`, :issue:`26617`, :issue:`26722`)
- :class:`RangeIndex` now performs standard lookup without instantiating an actual hashtable, hence saving memory (:issue:`16685`)
- Improved performance of :meth:`read_csv` by faster tokenizing and faster parsing of small float numbers (:issue:`25784`)
- Improved performance of :meth:`read_csv` by faster parsing of N/A and boolean values (:issue:`25804`)
- Improved performance of :attr:`IntervalIndex.is_monotonic`, :attr:`IntervalIndex.is_monotonic_increasing` and :attr:`IntervalIndex.is_monotonic_decreasing` by removing conversion to :class:`MultiIndex` (:issue:`24813`)
- Improved performance of :meth:`DataFrame.to_csv` when writing datetime dtypes (:issue:`25708`)
- Improved performance of :meth:`read_csv` by much faster parsing of
MM/YYYY
andDD/MM/YYYY
datetime formats (:issue:`25922`) - Improved performance of nanops for dtypes that cannot store NaNs. Speedup is particularly prominent for :meth:`Series.all` and :meth:`Series.any` (:issue:`25070`)
- Improved performance of :meth:`Series.map` for dictionary mappers on categorical series by mapping the categories instead of mapping all values (:issue:`23785`)
- Improved performance of :meth:`IntervalIndex.intersection` (:issue:`24813`)
- Improved performance of :meth:`read_csv` by faster concatenating date columns without extra conversion to string for integer/float zero and float
NaN
; by faster checking the string for the possibility of being a date (:issue:`25754`) - Improved performance of :attr:`IntervalIndex.is_unique` by removing conversion to
MultiIndex
(:issue:`24813`) - Restored performance of :meth:`DatetimeIndex.__iter__` by re-enabling specialized code path (:issue:`26702`)
- Improved performance when building :class:`MultiIndex` with at least one :class:`CategoricalIndex` level (:issue:`22044`)
- Improved performance by removing the need for a garbage collect when checking for
SettingWithCopyWarning
(:issue:`27031`) - For :meth:`to_datetime` changed default value of cache parameter to
True
(:issue:`26043`) - Improved performance of :class:`DatetimeIndex` and :class:`PeriodIndex` slicing given non-unique, monotonic data (:issue:`27136`).
- Improved performance of :meth:`pd.read_json` for index-oriented data. (:issue:`26773`)
- Improved performance of :meth:`MultiIndex.shape` (:issue:`27384`).
- Bug in :func:`DataFrame.at` and :func:`Series.at` that would raise exception if the index was a :class:`CategoricalIndex` (:issue:`20629`)
- Fixed bug in comparison of ordered :class:`Categorical` that contained missing values with a scalar which sometimes incorrectly resulted in
True
(:issue:`26504`) - Bug in :meth:`DataFrame.dropna` when the :class:`DataFrame` has a :class:`CategoricalIndex` containing :class:`Interval` objects incorrectly raised a
TypeError
(:issue:`25087`)
- Bug in :func:`to_datetime` which would raise an (incorrect)
ValueError
when called with a date far into the future and theformat
argument specified instead of raisingOutOfBoundsDatetime
(:issue:`23830`) - Bug in :func:`to_datetime` which would raise
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
when called withcache=True
, witharg
including at least two different elements from the set{None, numpy.nan, pandas.NaT}
(:issue:`22305`) - Bug in :class:`DataFrame` and :class:`Series` where timezone aware data with
dtype='datetime64[ns]
was not cast to naive (:issue:`25843`) - Improved :class:`Timestamp` type checking in various datetime functions to prevent exceptions when using a subclassed
datetime
(:issue:`25851`) - Bug in :class:`Series` and :class:`DataFrame` repr where
np.datetime64('NaT')
andnp.timedelta64('NaT')
withdtype=object
would be represented asNaN
(:issue:`25445`) - Bug in :func:`to_datetime` which does not replace the invalid argument with
NaT
when error is set to coerce (:issue:`26122`) - Bug in adding :class:`DateOffset` with nonzero month to :class:`DatetimeIndex` would raise
ValueError
(:issue:`26258`) - Bug in :func:`to_datetime` which raises unhandled
OverflowError
when called with mix of invalid dates andNaN
values withformat='%Y%m%d'
anderror='coerce'
(:issue:`25512`) - Bug in :meth:`isin` for datetimelike indexes; :class:`DatetimeIndex`, :class:`TimedeltaIndex` and :class:`PeriodIndex` where the
levels
parameter was ignored. (:issue:`26675`) - Bug in :func:`to_datetime` which raises
TypeError
forformat='%Y%m%d'
when called for invalid integer dates with length >= 6 digits witherrors='ignore'
- Bug when comparing a :class:`PeriodIndex` against a zero-dimensional numpy array (:issue:`26689`)
- Bug in constructing a
Series
orDataFrame
from a numpydatetime64
array with a non-ns unit and out-of-bound timestamps generating rubbish data, which will now correctly raise anOutOfBoundsDatetime
error (:issue:`26206`). - Bug in :func:`date_range` with unnecessary
OverflowError
being raised for very large or very small dates (:issue:`26651`) - Bug where adding :class:`Timestamp` to a
np.timedelta64
object would raise instead of returning a :class:`Timestamp` (:issue:`24775`) - Bug where comparing a zero-dimensional numpy array containing a
np.datetime64
object to a :class:`Timestamp` would incorrect raiseTypeError
(:issue:`26916`) - Bug in :func:`to_datetime` which would raise
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
when called withcache=True
, witharg
including datetime strings with different offset (:issue:`26097`)
- Bug in :func:`TimedeltaIndex.intersection` where for non-monotonic indices in some cases an empty
Index
was returned when in fact an intersection existed (:issue:`25913`) - Bug with comparisons between :class:`Timedelta` and
NaT
raisingTypeError
(:issue:`26039`) - Bug when adding or subtracting a :class:`BusinessHour` to a :class:`Timestamp` with the resulting time landing in a following or prior day respectively (:issue:`26381`)
- Bug when comparing a :class:`TimedeltaIndex` against a zero-dimensional numpy array (:issue:`26689`)
- Bug in :func:`DatetimeIndex.to_frame` where timezone aware data would be converted to timezone naive data (:issue:`25809`)
- Bug in :func:`to_datetime` with
utc=True
and datetime strings that would apply previously parsed UTC offsets to subsequent arguments (:issue:`24992`) - Bug in :func:`Timestamp.tz_localize` and :func:`Timestamp.tz_convert` does not propagate
freq
(:issue:`25241`) - Bug in :func:`Series.at` where setting :class:`Timestamp` with timezone raises
TypeError
(:issue:`25506`) - Bug in :func:`DataFrame.update` when updating with timezone aware data would return timezone naive data (:issue:`25807`)
- Bug in :func:`to_datetime` where an uninformative
RuntimeError
was raised when passing a naive :class:`Timestamp` with datetime strings with mixed UTC offsets (:issue:`25978`) - Bug in :func:`to_datetime` with
unit='ns'
would drop timezone information from the parsed argument (:issue:`26168`) - Bug in :func:`DataFrame.join` where joining a timezone aware index with a timezone aware column would result in a column of
NaN
(:issue:`26335`) - Bug in :func:`date_range` where ambiguous or nonexistent start or end times were not handled by the
ambiguous
ornonexistent
keywords respectively (:issue:`27088`) - Bug in :meth:`DatetimeIndex.union` when combining a timezone aware and timezone unaware :class:`DatetimeIndex` (:issue:`21671`)
- Bug when applying a numpy reduction function (e.g. :meth:`numpy.minimum`) to a timezone aware :class:`Series` (:issue:`15552`)
- Bug in :meth:`to_numeric` in which large negative numbers were being improperly handled (:issue:`24910`)
- Bug in :meth:`to_numeric` in which numbers were being coerced to float, even though
errors
was notcoerce
(:issue:`24910`) - Bug in :meth:`to_numeric` in which invalid values for
errors
were being allowed (:issue:`26466`) - Bug in :class:`format` in which floating point complex numbers were not being formatted to proper display precision and trimming (:issue:`25514`)
- Bug in error messages in :meth:`DataFrame.corr` and :meth:`Series.corr`. Added the possibility of using a callable. (:issue:`25729`)
- Bug in :meth:`Series.divmod` and :meth:`Series.rdivmod` which would raise an (incorrect)
ValueError
rather than return a pair of :class:`Series` objects as result (:issue:`25557`) - Raises a helpful exception when a non-numeric index is sent to :meth:`interpolate` with methods which require numeric index. (:issue:`21662`)
- Bug in :meth:`~pandas.eval` when comparing floats with scalar operators, for example:
x < -0.1
(:issue:`25928`) - Fixed bug where casting all-boolean array to integer extension array failed (:issue:`25211`)
- Bug in
divmod
with a :class:`Series` object containing zeros incorrectly raisingAttributeError
(:issue:`26987`) - Inconsistency in :class:`Series` floor-division (
//
) anddivmod
filling positive//zero withNaN
instead ofInf
(:issue:`27321`)
- Bug in :func:`DataFrame.astype` when passing a dict of columns and types the
errors
parameter was ignored. (:issue:`25905`)
- Bug in the
__name__
attribute of several methods of :class:`Series.str`, which were set incorrectly (:issue:`23551`) - Improved error message when passing :class:`Series` of wrong dtype to :meth:`Series.str.cat` (:issue:`22722`)
- Construction of :class:`Interval` is restricted to numeric, :class:`Timestamp` and :class:`Timedelta` endpoints (:issue:`23013`)
- Fixed bug in :class:`Series`/:class:`DataFrame` not displaying
NaN
in :class:`IntervalIndex` with missing values (:issue:`25984`) - Bug in :meth:`IntervalIndex.get_loc` where a
KeyError
would be incorrectly raised for a decreasing :class:`IntervalIndex` (:issue:`25860`) - Bug in :class:`Index` constructor where passing mixed closed :class:`Interval` objects would result in a
ValueError
instead of anobject
dtypeIndex
(:issue:`27172`)
- Improved exception message when calling :meth:`DataFrame.iloc` with a list of non-numeric objects (:issue:`25753`).
- Improved exception message when calling
.iloc
or.loc
with a boolean indexer with different length (:issue:`26658`). - Bug in
KeyError
exception message when indexing a :class:`MultiIndex` with a non-existent key not displaying the original key (:issue:`27250`). - Bug in
.iloc
and.loc
with a boolean indexer not raising anIndexError
when too few items are passed (:issue:`26658`). - Bug in :meth:`DataFrame.loc` and :meth:`Series.loc` where
KeyError
was not raised for aMultiIndex
when the key was less than or equal to the number of levels in the :class:`MultiIndex` (:issue:`14885`). - Bug in which :meth:`DataFrame.append` produced an erroneous warning indicating that a
KeyError
will be thrown in the future when the data to be appended contains new columns (:issue:`22252`). - Bug in which :meth:`DataFrame.to_csv` caused a segfault for a reindexed data frame, when the indices were single-level :class:`MultiIndex` (:issue:`26303`).
- Fixed bug where assigning a :class:`arrays.PandasArray` to a :class:`.DataFrame` would raise error (:issue:`26390`)
- Allow keyword arguments for callable local reference used in the :meth:`DataFrame.query` string (:issue:`26426`)
- Fixed a
KeyError
when indexing a :class:`MultiIndex` level with a list containing exactly one label, which is missing (:issue:`27148`) - Bug which produced
AttributeError
on partial matching :class:`Timestamp` in a :class:`MultiIndex` (:issue:`26944`) - Bug in :class:`Categorical` and :class:`CategoricalIndex` with :class:`Interval` values when using the
in
operator (__contains
) with objects that are not comparable to the values in theInterval
(:issue:`23705`) - Bug in :meth:`DataFrame.loc` and :meth:`DataFrame.iloc` on a :class:`DataFrame` with a single timezone-aware datetime64[ns] column incorrectly returning a scalar instead of a :class:`Series` (:issue:`27110`)
- Bug in :class:`CategoricalIndex` and :class:`Categorical` incorrectly raising
ValueError
instead ofTypeError
when a list is passed using thein
operator (__contains__
) (:issue:`21729`) - Bug in setting a new value in a :class:`Series` with a :class:`Timedelta` object incorrectly casting the value to an integer (:issue:`22717`)
- Bug in :class:`Series` setting a new key (
__setitem__
) with a timezone-aware datetime incorrectly raisingValueError
(:issue:`12862`) - Bug in :meth:`DataFrame.iloc` when indexing with a read-only indexer (:issue:`17192`)
- Bug in :class:`Series` setting an existing tuple key (
__setitem__
) with timezone-aware datetime values incorrectly raisingTypeError
(:issue:`20441`)
- Fixed misleading exception message in :meth:`Series.interpolate` if argument
order
is required, but omitted (:issue:`10633`, :issue:`24014`). - Fixed class type displayed in exception message in :meth:`DataFrame.dropna` if invalid
axis
parameter passed (:issue:`25555`) - A
ValueError
will now be thrown by :meth:`DataFrame.fillna` whenlimit
is not a positive integer (:issue:`27042`)
- Bug in which incorrect exception raised by :class:`Timedelta` when testing the membership of :class:`MultiIndex` (:issue:`24570`)
- Bug in :func:`DataFrame.to_html` where values were truncated using display options instead of outputting the full content (:issue:`17004`)
- Fixed bug in missing text when using :meth:`to_clipboard` if copying utf-16 characters in Python 3 on Windows (:issue:`25040`)
- Bug in :func:`read_json` for
orient='table'
when it tries to infer dtypes by default, which is not applicable as dtypes are already defined in the JSON schema (:issue:`21345`) - Bug in :func:`read_json` for
orient='table'
and float index, as it infers index dtype by default, which is not applicable because index dtype is already defined in the JSON schema (:issue:`25433`) - Bug in :func:`read_json` for
orient='table'
and string of float column names, as it makes a column name type conversion to :class:`Timestamp`, which is not applicable because column names are already defined in the JSON schema (:issue:`25435`) - Bug in :func:`json_normalize` for
errors='ignore'
where missing values in the input data, were filled in resultingDataFrame
with the string"nan"
instead ofnumpy.nan
(:issue:`25468`) - :meth:`DataFrame.to_html` now raises
TypeError
when using an invalid type for theclasses
parameter instead ofAssertionError
(:issue:`25608`) - Bug in :meth:`DataFrame.to_string` and :meth:`DataFrame.to_latex` that would lead to incorrect output when the
header
keyword is used (:issue:`16718`) - Bug in :func:`read_csv` not properly interpreting the UTF8 encoded filenames on Windows on Python 3.6+ (:issue:`15086`)
- Improved performance in :meth:`pandas.read_stata` and :class:`pandas.io.stata.StataReader` when converting columns that have missing values (:issue:`25772`)
- Bug in :meth:`DataFrame.to_html` where header numbers would ignore display options when rounding (:issue:`17280`)
- Bug in :func:`read_hdf` where reading a table from an HDF5 file written directly with PyTables fails with a
ValueError
when using a sub-selection via thestart
orstop
arguments (:issue:`11188`) - Bug in :func:`read_hdf` not properly closing store after a
KeyError
is raised (:issue:`25766`) - Improved the explanation for the failure when value labels are repeated in Stata dta files and suggested workarounds (:issue:`25772`)
- Improved :meth:`pandas.read_stata` and :class:`pandas.io.stata.StataReader` to read incorrectly formatted 118 format files saved by Stata (:issue:`25960`)
- Improved the
col_space
parameter in :meth:`DataFrame.to_html` to accept a string so CSS length values can be set correctly (:issue:`25941`) - Fixed bug in loading objects from S3 that contain
#
characters in the URL (:issue:`25945`) - Adds
use_bqstorage_api
parameter to :func:`read_gbq` to speed up downloads of large data frames. This feature requires version 0.10.0 of thepandas-gbq
library as well as thegoogle-cloud-bigquery-storage
andfastavro
libraries. (:issue:`26104`) - Fixed memory leak in :meth:`DataFrame.to_json` when dealing with numeric data (:issue:`24889`)
- Bug in :func:`read_json` where date strings with
Z
were not converted to a UTC timezone (:issue:`26168`) - Added
cache_dates=True
parameter to :meth:`read_csv`, which allows to cache unique dates when they are parsed (:issue:`25990`) - :meth:`DataFrame.to_excel` now raises a
ValueError
when the caller's dimensions exceed the limitations of Excel (:issue:`26051`) - Fixed bug in :func:`pandas.read_csv` where a BOM would result in incorrect parsing using engine='python' (:issue:`26545`)
- :func:`read_excel` now raises a
ValueError
when input is of type :class:`pandas.io.excel.ExcelFile` andengine
param is passed since :class:`pandas.io.excel.ExcelFile` has an engine defined (:issue:`26566`) - Bug while selecting from :class:`HDFStore` with
where=''
specified (:issue:`26610`). - Fixed bug in :func:`DataFrame.to_excel` where custom objects (i.e.
PeriodIndex
) inside merged cells were not being converted into types safe for the Excel writer (:issue:`27006`) - Bug in :meth:`read_hdf` where reading a timezone aware :class:`DatetimeIndex` would raise a
TypeError
(:issue:`11926`) - Bug in :meth:`to_msgpack` and :meth:`read_msgpack` which would raise a
ValueError
rather than aFileNotFoundError
for an invalid path (:issue:`27160`) - Fixed bug in :meth:`DataFrame.to_parquet` which would raise a
ValueError
when the dataframe had no columns (:issue:`27339`) - Allow parsing of :class:`PeriodDtype` columns when using :func:`read_csv` (:issue:`26934`)
- Fixed bug where :class:`api.extensions.ExtensionArray` could not be used in matplotlib plotting (:issue:`25587`)
- Bug in an error message in :meth:`DataFrame.plot`. Improved the error message if non-numerics are passed to :meth:`DataFrame.plot` (:issue:`25481`)
- Bug in incorrect ticklabel positions when plotting an index that are non-numeric / non-datetime (:issue:`7612`, :issue:`15912`, :issue:`22334`)
- Fixed bug causing plots of :class:`PeriodIndex` timeseries to fail if the frequency is a multiple of the frequency rule code (:issue:`14763`)
- Fixed bug when plotting a :class:`DatetimeIndex` with
datetime.timezone.utc
timezone (:issue:`17173`)
- Bug in :meth:`.Resampler.agg` with a timezone aware index where
OverflowError
would raise when passing a list of functions (:issue:`22660`) - Bug in :meth:`.DataFrameGroupBy.nunique` in which the names of column levels were lost (:issue:`23222`)
- Bug in :func:`.GroupBy.agg` when applying an aggregation function to timezone aware data (:issue:`23683`)
- Bug in :func:`.GroupBy.first` and :func:`.GroupBy.last` where timezone information would be dropped (:issue:`21603`)
- Bug in :func:`.GroupBy.size` when grouping only NA values (:issue:`23050`)
- Bug in :func:`Series.groupby` where
observed
kwarg was previously ignored (:issue:`24880`) - Bug in :func:`Series.groupby` where using
groupby
with a :class:`MultiIndex` Series with a list of labels equal to the length of the series caused incorrect grouping (:issue:`25704`) - Ensured that ordering of outputs in
groupby
aggregation functions is consistent across all versions of Python (:issue:`25692`) - Ensured that result group order is correct when grouping on an ordered
Categorical
and specifyingobserved=True
(:issue:`25871`, :issue:`25167`) - Bug in :meth:`.Rolling.min` and :meth:`.Rolling.max` that caused a memory leak (:issue:`25893`)
- Bug in :meth:`.Rolling.count` and
.Expanding.count
was previously ignoring theaxis
keyword (:issue:`13503`) - Bug in :meth:`.GroupBy.idxmax` and :meth:`.GroupBy.idxmin` with datetime column would return incorrect dtype (:issue:`25444`, :issue:`15306`)
- Bug in :meth:`.GroupBy.cumsum`, :meth:`.GroupBy.cumprod`, :meth:`.GroupBy.cummin` and :meth:`.GroupBy.cummax` with categorical column having absent categories, would return incorrect result or segfault (:issue:`16771`)
- Bug in :meth:`.GroupBy.nth` where NA values in the grouping would return incorrect results (:issue:`26011`)
- Bug in :meth:`.SeriesGroupBy.transform` where transforming an empty group would raise a
ValueError
(:issue:`26208`) - Bug in :meth:`.DataFrame.groupby` where passing a :class:`.Grouper` would return incorrect groups when using the
.groups
accessor (:issue:`26326`) - Bug in :meth:`.GroupBy.agg` where incorrect results are returned for uint64 columns. (:issue:`26310`)
- Bug in :meth:`.Rolling.median` and :meth:`.Rolling.quantile` where MemoryError is raised with empty window (:issue:`26005`)
- Bug in :meth:`.Rolling.median` and :meth:`.Rolling.quantile` where incorrect results are returned with
closed='left'
andclosed='neither'
(:issue:`26005`) - Improved :class:`.Rolling`, :class:`.Window` and :class:`.ExponentialMovingWindow` functions to exclude nuisance columns from results instead of raising errors and raise a
DataError
only if all columns are nuisance (:issue:`12537`) - Bug in :meth:`.Rolling.max` and :meth:`.Rolling.min` where incorrect results are returned with an empty variable window (:issue:`26005`)
- Raise a helpful exception when an unsupported weighted window function is used as an argument of :meth:`.Window.aggregate` (:issue:`26597`)
- Bug in :func:`pandas.merge` adds a string of
None
, ifNone
is assigned in suffixes instead of remain the column name as-is (:issue:`24782`). - Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (missing index values are now assigned NA) (:issue:`24212`, :issue:`25009`)
- :func:`to_records` now accepts dtypes to its
column_dtypes
parameter (:issue:`24895`) - Bug in :func:`concat` where order of
OrderedDict
(anddict
in Python 3.6+) is not respected, when passed in asobjs
argument (:issue:`21510`) - Bug in :func:`pivot_table` where columns with
NaN
values are dropped even ifdropna
argument isFalse
, when theaggfunc
argument contains alist
(:issue:`22159`) - Bug in :func:`concat` where the resulting
freq
of two :class:`DatetimeIndex` with the samefreq
would be dropped (:issue:`3232`). - Bug in :func:`merge` where merging with equivalent Categorical dtypes was raising an error (:issue:`22501`)
- bug in :class:`DataFrame` instantiating with a dict of iterators or generators (e.g.
pd.DataFrame({'A': reversed(range(3))})
) raised an error (:issue:`26349`). - Bug in :class:`DataFrame` instantiating with a
range
(e.g.pd.DataFrame(range(3))
) raised an error (:issue:`26342`). - Bug in :class:`DataFrame` constructor when passing non-empty tuples would cause a segmentation fault (:issue:`25691`)
- Bug in :func:`Series.apply` failed when the series is a timezone aware :class:`DatetimeIndex` (:issue:`25959`)
- Bug in :func:`pandas.cut` where large bins could incorrectly raise an error due to an integer overflow (:issue:`26045`)
- Bug in :func:`DataFrame.sort_index` where an error is thrown when a multi-indexed
DataFrame
is sorted on all levels with the initial level sorted last (:issue:`26053`) - Bug in :meth:`Series.nlargest` treats
True
as smaller thanFalse
(:issue:`26154`) - Bug in :func:`DataFrame.pivot_table` with a :class:`IntervalIndex` as pivot index would raise
TypeError
(:issue:`25814`) - Bug in which :meth:`DataFrame.from_dict` ignored order of
OrderedDict
whenorient='index'
(:issue:`8425`). - Bug in :meth:`DataFrame.transpose` where transposing a DataFrame with a timezone-aware datetime column would incorrectly raise
ValueError
(:issue:`26825`) - Bug in :func:`pivot_table` when pivoting a timezone aware column as the
values
would remove timezone information (:issue:`14948`) - Bug in :func:`merge_asof` when specifying multiple
by
columns where one isdatetime64[ns, tz]
dtype (:issue:`26649`)
- Significant speedup in :class:`SparseArray` initialization that benefits most operations, fixing performance regression introduced in v0.20.0 (:issue:`24985`)
- Bug in :class:`SparseFrame` constructor where passing
None
as the data would causedefault_fill_value
to be ignored (:issue:`16807`) - Bug in :class:`SparseDataFrame` when adding a column in which the length of values does not match length of index,
AssertionError
is raised instead of raisingValueError
(:issue:`25484`) - Introduce a better error message in :meth:`Series.sparse.from_coo` so it returns a
TypeError
for inputs that are not coo matrices (:issue:`26554`) - Bug in :func:`numpy.modf` on a :class:`SparseArray`. Now a tuple of :class:`SparseArray` is returned (:issue:`26946`).
- Fix install error with PyPy on macOS (:issue:`26536`)
- Bug in :func:`factorize` when passing an
ExtensionArray
with a customna_sentinel
(:issue:`25696`). - :meth:`Series.count` miscounts NA values in ExtensionArrays (:issue:`26835`)
- Added
Series.__array_ufunc__
to better handle NumPy ufuncs applied to Series backed by extension arrays (:issue:`23293`). - Keyword argument
deep
has been removed from :meth:`ExtensionArray.copy` (:issue:`27083`)
- Removed unused C functions from vendored UltraJSON implementation (:issue:`26198`)
- Allow :class:`Index` and :class:`RangeIndex` to be passed to numpy
min
andmax
functions (:issue:`26125`) - Use actual class name in repr of empty objects of a
Series
subclass (:issue:`27001`). - Bug in :class:`DataFrame` where passing an object array of timezone-aware
datetime
objects would incorrectly raiseValueError
(:issue:`13287`)
.. contributors:: v0.24.2..v0.25.0