{{ header }}
Note
We are proud to announce that pandas has become a sponsored project of the (NumFOCUS organization). This will help ensure the success of development of pandas as a world-class open-source project.
This is a minor bug-fix release from 0.17.0 and includes a large number of bug fixes along several new features, enhancements, and performance improvements. We recommend that all users upgrade to this version.
Highlights include:
- Support for Conditional HTML Formatting, see :ref:`here <whatsnew_0171.style>`
- Releasing the GIL on the csv reader & other ops, see :ref:`here <whatsnew_0171.performance>`
- Fixed regression in
DataFrame.drop_duplicates
from 0.16.2, causing incorrect results on integer values (:issue:`11376`)
What's new in v0.17.1
Warning
This is a new feature and is under active development. We'll be adding features an possibly making breaking changes in future releases. Feedback is welcome in :issue:`11610`
We've added experimental support for conditional HTML formatting: the visual styling of a DataFrame based on the data. The styling is accomplished with HTML and CSS. Accesses the styler class with the :attr:`pandas.DataFrame.style`, attribute, an instance of :class:`.Styler` with your data attached.
Here's a quick example:
.. ipython:: python np.random.seed(123) df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde")) html = df.style.background_gradient(cmap="viridis", low=0.5)
We can render the HTML to get the following table.
:class:`.Styler` interacts nicely with the Jupyter Notebook. See the :ref:`documentation </user_guide/style.ipynb>` for more.
DatetimeIndex
now supports conversion to strings withastype(str)
(:issue:`10442`)Support for
compression
(gzip/bz2) in :meth:`pandas.DataFrame.to_csv` (:issue:`7615`)pd.read_*
functions can now also accept :class:`python:pathlib.Path`, or :class:`py:py._path.local.LocalPath` objects for thefilepath_or_buffer
argument. (:issue:`11033`) - TheDataFrame
andSeries
functions.to_csv()
,.to_html()
and.to_latex()
can now handle paths beginning with tildes (e.g.~/Documents/
) (:issue:`11438`)DataFrame
now uses the fields of anamedtuple
as columns, if columns are not supplied (:issue:`11181`)DataFrame.itertuples()
now returnsnamedtuple
objects, when possible. (:issue:`11269`, :issue:`11625`)Added
axvlines_kwds
to parallel coordinates plot (:issue:`10709`)Option to
.info()
and.memory_usage()
to provide for deep introspection of memory consumption. Note that this can be expensive to compute and therefore is an optional parameter. (:issue:`11595`).. ipython:: python df = pd.DataFrame({"A": ["foo"] * 1000}) # noqa: F821 df["B"] = df["A"].astype("category") # shows the '+' as we have object dtypes df.info() # we have an accurate memory assessment (but can be expensive to compute this) df.info(memory_usage="deep")
Index
now has afillna
method (:issue:`10089`).. ipython:: python pd.Index([1, np.nan, 3]).fillna(2)
Series of type
category
now make.str.<...>
and.dt.<...>
accessor methods / properties available, if the categories are of that type. (:issue:`10661`).. ipython:: python s = pd.Series(list("aabb")).astype("category") s s.str.contains("a") date = pd.Series(pd.date_range("1/1/2015", periods=5)).astype("category") date date.dt.day
pivot_table
now has amargins_name
argument so you can use something other than the default of 'All' (:issue:`3335`)Implement export of
datetime64[ns, tz]
dtypes with a fixed HDF5 store (:issue:`11411`)Pretty printing sets (e.g. in DataFrame cells) now uses set literal syntax (
{x, y}
) instead of Legacy Python syntax (set([x, y])
) (:issue:`11215`)Improve the error message in :func:`pandas.io.gbq.to_gbq` when a streaming insert fails (:issue:`11285`) and when the DataFrame does not match the schema of the destination table (:issue:`11359`)
- raise
NotImplementedError
inIndex.shift
for non-supported index types (:issue:`8038`) min
andmax
reductions ondatetime64
andtimedelta64
dtyped series now result inNaT
and notnan
(:issue:`11245`).- Indexing with a null key will raise a
TypeError
, instead of aValueError
(:issue:`11356`) Series.ptp
will now ignore missing values by default (:issue:`11163`)
- The
pandas.io.ga
module which implementsgoogle-analytics
support is deprecated and will be removed in a future version (:issue:`11308`) - Deprecate the
engine
keyword in.to_csv()
, which will be removed in a future version (:issue:`11274`)
- Checking monotonic-ness before sorting on an index (:issue:`11080`)
Series.dropna
performance improvement when its dtype can't containNaN
(:issue:`11159`)- Release the GIL on most datetime field operations (e.g.
DatetimeIndex.year
,Series.dt.year
), normalization, and conversion to and fromPeriod
,DatetimeIndex.to_period
andPeriodIndex.to_timestamp
(:issue:`11263`) - Release the GIL on some rolling algos:
rolling_median
,rolling_mean
,rolling_max
,rolling_min
,rolling_var
,rolling_kurt
,rolling_skew
(:issue:`11450`) - Release the GIL when reading and parsing text files in
read_csv
,read_table
(:issue:`11272`) - Improved performance of
rolling_median
(:issue:`11450`) - Improved performance of
to_excel
(:issue:`11352`) - Performance bug in repr of
Categorical
categories, which was rendering the strings before chopping them for display (:issue:`11305`) - Performance improvement in
Categorical.remove_unused_categories
, (:issue:`11643`). - Improved performance of
Series
constructor with no data andDatetimeIndex
(:issue:`11433`) - Improved performance of
shift
,cumprod
, andcumsum
with groupby (:issue:`4095`)
SparseArray.__iter__()
now does not causePendingDeprecationWarning
in Python 3.5 (:issue:`11622`)- Regression from 0.16.2 for output formatting of long floats/nan, restored in (:issue:`11302`)
Series.sort_index()
now correctly handles theinplace
option (:issue:`11402`)- Incorrectly distributed .c file in the build on
PyPi
when reading a csv of floats and passingna_values=<a scalar>
would show an exception (:issue:`11374`) - Bug in
.to_latex()
output broken when the index has a name (:issue:`10660`) - Bug in
HDFStore.append
with strings whose encoded length exceeded the max unencoded length (:issue:`11234`) - Bug in merging
datetime64[ns, tz]
dtypes (:issue:`11405`) - Bug in
HDFStore.select
when comparing with a numpy scalar in a where clause (:issue:`11283`) - Bug in using
DataFrame.ix
with a MultiIndex indexer (:issue:`11372`) - Bug in
date_range
with ambiguous endpoints (:issue:`11626`) - Prevent adding new attributes to the accessors
.str
,.dt
and.cat
. Retrieving such a value was not possible, so error out on setting it. (:issue:`10673`) - Bug in tz-conversions with an ambiguous time and
.dt
accessors (:issue:`11295`) - Bug in output formatting when using an index of ambiguous times (:issue:`11619`)
- Bug in comparisons of Series vs list-likes (:issue:`11339`)
- Bug in
DataFrame.replace
with adatetime64[ns, tz]
and a non-compat to_replace (:issue:`11326`, :issue:`11153`) - Bug in
isnull
wherenumpy.datetime64('NaT')
in anumpy.array
was not determined to be null(:issue:`11206`) - Bug in list-like indexing with a mixed-integer Index (:issue:`11320`)
- Bug in
pivot_table
withmargins=True
when indexes are ofCategorical
dtype (:issue:`10993`) - Bug in
DataFrame.plot
cannot use hex strings colors (:issue:`10299`) - Regression in
DataFrame.drop_duplicates
from 0.16.2, causing incorrect results on integer values (:issue:`11376`) - Bug in
pd.eval
where unary ops in a list error (:issue:`11235`) - Bug in
squeeze()
with zero length arrays (:issue:`11230`, :issue:`8999`) - Bug in
describe()
dropping column names for hierarchical indexes (:issue:`11517`) - Bug in
DataFrame.pct_change()
not propagatingaxis
keyword on.fillna
method (:issue:`11150`) - Bug in
.to_csv()
when a mix of integer and string column names are passed as thecolumns
parameter (:issue:`11637`) - Bug in indexing with a
range
, (:issue:`11652`) - Bug in inference of numpy scalars and preserving dtype when setting columns (:issue:`11638`)
- Bug in
to_sql
using unicode column names giving UnicodeEncodeError with (:issue:`11431`). - Fix regression in setting of
xticks
inplot
(:issue:`11529`). - Bug in
holiday.dates
where observance rules could not be applied to holiday and doc enhancement (:issue:`11477`, :issue:`11533`) - Fix plotting issues when having plain
Axes
instances instead ofSubplotAxes
(:issue:`11520`, :issue:`11556`). - Bug in
DataFrame.to_latex()
produces an extra rule whenheader=False
(:issue:`7124`) - Bug in
df.groupby(...).apply(func)
when a func returns aSeries
containing a new datetimelike column (:issue:`11324`) - Bug in
pandas.json
when file to load is big (:issue:`11344`) - Bugs in
to_excel
with duplicate columns (:issue:`11007`, :issue:`10982`, :issue:`10970`) - Fixed a bug that prevented the construction of an empty series of dtype
datetime64[ns, tz]
(:issue:`11245`). - Bug in
read_excel
with MultiIndex containing integers (:issue:`11317`) - Bug in
to_excel
with openpyxl 2.2+ and merging (:issue:`11408`) - Bug in
DataFrame.to_dict()
produces anp.datetime64
object instead ofTimestamp
when only datetime is present in data (:issue:`11327`) - Bug in
DataFrame.corr()
raises exception when computes Kendall correlation for DataFrames with boolean and not boolean columns (:issue:`11560`) - Bug in the link-time error caused by C
inline
functions on FreeBSD 10+ (withclang
) (:issue:`10510`) - Bug in
DataFrame.to_csv
in passing through arguments for formattingMultiIndexes
, includingdate_format
(:issue:`7791`) - Bug in
DataFrame.join()
withhow='right'
producing aTypeError
(:issue:`11519`) - Bug in
Series.quantile
with empty list results hasIndex
withobject
dtype (:issue:`11588`) - Bug in
pd.merge
results in emptyInt64Index
rather thanIndex(dtype=object)
when the merge result is empty (:issue:`11588`) - Bug in
Categorical.remove_unused_categories
when havingNaN
values (:issue:`11599`) - Bug in
DataFrame.to_sparse()
loses column names for MultiIndexes (:issue:`11600`) - Bug in
DataFrame.round()
with non-unique column index producing a Fatal Python error (:issue:`11611`) - Bug in
DataFrame.round()
withdecimals
being a non-unique indexed Series producing extra columns (:issue:`11618`)
.. contributors:: v0.17.0..v0.17.1