Version 0.17.1 (November 21, 2015)

Note

We are proud to announce that pandas has become a sponsored project of the (NumFOCUS organization). This will help ensure the success of development of pandas as a world-class open-source project.

This is a minor bug-fix release from 0.17.0 and includes a large number of bug fixes along several new features, enhancements, and performance improvements. We recommend that all users upgrade to this version.

Highlights include:

Support for Conditional HTML Formatting, see :ref:`here <whatsnew_0171.style>`
Releasing the GIL on the csv reader & other ops, see :ref:`here <whatsnew_0171.performance>`
Fixed regression in DataFrame.drop_duplicates from 0.16.2, causing incorrect results on integer values (:issue:`11376`)

What's new in v0.17.1

New features
- Conditional HTML formatting
Enhancements
API changes
- Deprecations
Performance improvements
Bug fixes
Contributors

New features

Conditional HTML formatting

Warning

This is a new feature and is under active development. We'll be adding features an possibly making breaking changes in future releases. Feedback is welcome in :issue:`11610`

We've added experimental support for conditional HTML formatting: the visual styling of a DataFrame based on the data. The styling is accomplished with HTML and CSS. Accesses the styler class with the :attr:`pandas.DataFrame.style`, attribute, an instance of :class:`.Styler` with your data attached.

Here's a quick example:

.. ipython:: python

  np.random.seed(123)
  df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde"))
  html = df.style.background_gradient(cmap="viridis", low=0.5)

We can render the HTML to get the following table.

:class:`.Styler` interacts nicely with the Jupyter Notebook. See the :ref:`documentation </user_guide/style.ipynb>` for more.

Enhancements

DatetimeIndex now supports conversion to strings with astype(str) (:issue:`10442`)
Support for compression (gzip/bz2) in :meth:`pandas.DataFrame.to_csv` (:issue:`7615`)
pd.read_* functions can now also accept :class:`python:pathlib.Path`, or :class:`py:py._path.local.LocalPath` objects for the filepath_or_buffer argument. (:issue:`11033`) - The DataFrame and Series functions .to_csv(), .to_html() and .to_latex() can now handle paths beginning with tildes (e.g. ~/Documents/) (:issue:`11438`)
DataFrame now uses the fields of a namedtuple as columns, if columns are not supplied (:issue:`11181`)
DataFrame.itertuples() now returns namedtuple objects, when possible. (:issue:`11269`, :issue:`11625`)
Added axvlines_kwds to parallel coordinates plot (:issue:`10709`)

Option to .info() and .memory_usage() to provide for deep introspection of memory consumption. Note that this can be expensive to compute and therefore is an optional parameter. (:issue:`11595`)

.. ipython:: python

   df = pd.DataFrame({"A": ["foo"] * 1000})  # noqa: F821
   df["B"] = df["A"].astype("category")

   # shows the '+' as we have object dtypes
   df.info()

   # we have an accurate memory assessment (but can be expensive to compute this)
   df.info(memory_usage="deep")

Index now has a fillna method (:issue:`10089`)

.. ipython:: python

   pd.Index([1, np.nan, 3]).fillna(2)

Series of type category now make .str.<...> and .dt.<...> accessor methods / properties available, if the categories are of that type. (:issue:`10661`)

.. ipython:: python

   s = pd.Series(list("aabb")).astype("category")
   s
   s.str.contains("a")

   date = pd.Series(pd.date_range("1/1/2015", periods=5)).astype("category")
   date
   date.dt.day

pivot_table now has a margins_name argument so you can use something other than the default of 'All' (:issue:`3335`)
Implement export of datetime64[ns, tz] dtypes with a fixed HDF5 store (:issue:`11411`)
Pretty printing sets (e.g. in DataFrame cells) now uses set literal syntax ({x, y}) instead of Legacy Python syntax (set([x, y])) (:issue:`11215`)
Improve the error message in :func:`pandas.io.gbq.to_gbq` when a streaming insert fails (:issue:`11285`) and when the DataFrame does not match the schema of the destination table (:issue:`11359`)

API changes

raise NotImplementedError in Index.shift for non-supported index types (:issue:`8038`)
min and max reductions on datetime64 and timedelta64 dtyped series now result in NaT and not nan (:issue:`11245`).
Indexing with a null key will raise a TypeError, instead of a ValueError (:issue:`11356`)
Series.ptp will now ignore missing values by default (:issue:`11163`)

Deprecations

The pandas.io.ga module which implements google-analytics support is deprecated and will be removed in a future version (:issue:`11308`)
Deprecate the engine keyword in .to_csv(), which will be removed in a future version (:issue:`11274`)

Performance improvements

Checking monotonic-ness before sorting on an index (:issue:`11080`)
Series.dropna performance improvement when its dtype can't contain NaN (:issue:`11159`)
Release the GIL on most datetime field operations (e.g. DatetimeIndex.year, Series.dt.year), normalization, and conversion to and from Period, DatetimeIndex.to_period and PeriodIndex.to_timestamp (:issue:`11263`)
Release the GIL on some rolling algos: rolling_median, rolling_mean, rolling_max, rolling_min, rolling_var, rolling_kurt, rolling_skew (:issue:`11450`)
Release the GIL when reading and parsing text files in read_csv, read_table (:issue:`11272`)
Improved performance of rolling_median (:issue:`11450`)
Improved performance of to_excel (:issue:`11352`)
Performance bug in repr of Categorical categories, which was rendering the strings before chopping them for display (:issue:`11305`)
Performance improvement in Categorical.remove_unused_categories, (:issue:`11643`).
Improved performance of Series constructor with no data and DatetimeIndex (:issue:`11433`)
Improved performance of shift, cumprod, and cumsum with groupby (:issue:`4095`)

Bug fixes

SparseArray.__iter__() now does not cause PendingDeprecationWarning in Python 3.5 (:issue:`11622`)
Regression from 0.16.2 for output formatting of long floats/nan, restored in (:issue:`11302`)
Series.sort_index() now correctly handles the inplace option (:issue:`11402`)
Incorrectly distributed .c file in the build on PyPi when reading a csv of floats and passing na_values=<a scalar> would show an exception (:issue:`11374`)
Bug in .to_latex() output broken when the index has a name (:issue:`10660`)
Bug in HDFStore.append with strings whose encoded length exceeded the max unencoded length (:issue:`11234`)
Bug in merging datetime64[ns, tz] dtypes (:issue:`11405`)
Bug in HDFStore.select when comparing with a numpy scalar in a where clause (:issue:`11283`)
Bug in using DataFrame.ix with a MultiIndex indexer (:issue:`11372`)
Bug in date_range with ambiguous endpoints (:issue:`11626`)
Prevent adding new attributes to the accessors .str, .dt and .cat. Retrieving such a value was not possible, so error out on setting it. (:issue:`10673`)
Bug in tz-conversions with an ambiguous time and .dt accessors (:issue:`11295`)
Bug in output formatting when using an index of ambiguous times (:issue:`11619`)
Bug in comparisons of Series vs list-likes (:issue:`11339`)
Bug in DataFrame.replace with a datetime64[ns, tz] and a non-compat to_replace (:issue:`11326`, :issue:`11153`)
Bug in isnull where numpy.datetime64('NaT') in a numpy.array was not determined to be null(:issue:`11206`)
Bug in list-like indexing with a mixed-integer Index (:issue:`11320`)
Bug in pivot_table with margins=True when indexes are of Categorical dtype (:issue:`10993`)
Bug in DataFrame.plot cannot use hex strings colors (:issue:`10299`)
Regression in DataFrame.drop_duplicates from 0.16.2, causing incorrect results on integer values (:issue:`11376`)
Bug in pd.eval where unary ops in a list error (:issue:`11235`)
Bug in squeeze() with zero length arrays (:issue:`11230`, :issue:`8999`)
Bug in describe() dropping column names for hierarchical indexes (:issue:`11517`)
Bug in DataFrame.pct_change() not propagating axis keyword on .fillna method (:issue:`11150`)
Bug in .to_csv() when a mix of integer and string column names are passed as the columns parameter (:issue:`11637`)
Bug in indexing with a range, (:issue:`11652`)
Bug in inference of numpy scalars and preserving dtype when setting columns (:issue:`11638`)
Bug in to_sql using unicode column names giving UnicodeEncodeError with (:issue:`11431`).
Fix regression in setting of xticks in plot (:issue:`11529`).
Bug in holiday.dates where observance rules could not be applied to holiday and doc enhancement (:issue:`11477`, :issue:`11533`)
Fix plotting issues when having plain Axes instances instead of SubplotAxes (:issue:`11520`, :issue:`11556`).
Bug in DataFrame.to_latex() produces an extra rule when header=False (:issue:`7124`)
Bug in df.groupby(...).apply(func) when a func returns a Series containing a new datetimelike column (:issue:`11324`)
Bug in pandas.json when file to load is big (:issue:`11344`)
Bugs in to_excel with duplicate columns (:issue:`11007`, :issue:`10982`, :issue:`10970`)
Fixed a bug that prevented the construction of an empty series of dtype datetime64[ns, tz] (:issue:`11245`).
Bug in read_excel with MultiIndex containing integers (:issue:`11317`)
Bug in to_excel with openpyxl 2.2+ and merging (:issue:`11408`)
Bug in DataFrame.to_dict() produces a np.datetime64 object instead of Timestamp when only datetime is present in data (:issue:`11327`)
Bug in DataFrame.corr() raises exception when computes Kendall correlation for DataFrames with boolean and not boolean columns (:issue:`11560`)
Bug in the link-time error caused by C inline functions on FreeBSD 10+ (with clang) (:issue:`10510`)
Bug in DataFrame.to_csv in passing through arguments for formatting MultiIndexes, including date_format (:issue:`7791`)
Bug in DataFrame.join() with how='right' producing a TypeError (:issue:`11519`)
Bug in Series.quantile with empty list results has Index with object dtype (:issue:`11588`)
Bug in pd.merge results in empty Int64Index rather than Index(dtype=object) when the merge result is empty (:issue:`11588`)
Bug in Categorical.remove_unused_categories when having NaN values (:issue:`11599`)
Bug in DataFrame.to_sparse() loses column names for MultiIndexes (:issue:`11600`)
Bug in DataFrame.round() with non-unique column index producing a Fatal Python error (:issue:`11611`)
Bug in DataFrame.round() with decimals being a non-unique indexed Series producing extra columns (:issue:`11618`)

Contributors

.. contributors:: v0.17.0..v0.17.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!