Skip to content

Latest commit

 

History

History
213 lines (161 loc) · 11.5 KB

v0.17.1.rst

File metadata and controls

213 lines (161 loc) · 11.5 KB

Version 0.17.1 (November 21, 2015)

{{ header }}

Note

We are proud to announce that pandas has become a sponsored project of the (NumFOCUS organization). This will help ensure the success of development of pandas as a world-class open-source project.

This is a minor bug-fix release from 0.17.0 and includes a large number of bug fixes along several new features, enhancements, and performance improvements. We recommend that all users upgrade to this version.

Highlights include:

New features

Conditional HTML formatting

Warning

This is a new feature and is under active development. We'll be adding features an possibly making breaking changes in future releases. Feedback is welcome in :issue:`11610`

We've added experimental support for conditional HTML formatting: the visual styling of a DataFrame based on the data. The styling is accomplished with HTML and CSS. Accesses the styler class with the :attr:`pandas.DataFrame.style`, attribute, an instance of :class:`.Styler` with your data attached.

Here's a quick example:

.. ipython:: python

  np.random.seed(123)
  df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde"))
  html = df.style.background_gradient(cmap="viridis", low=0.5)

We can render the HTML to get the following table.

:class:`.Styler` interacts nicely with the Jupyter Notebook. See the :ref:`documentation </user_guide/style.ipynb>` for more.

Enhancements

  • DatetimeIndex now supports conversion to strings with astype(str) (:issue:`10442`)

  • Support for compression (gzip/bz2) in :meth:`pandas.DataFrame.to_csv` (:issue:`7615`)

  • pd.read_* functions can now also accept :class:`python:pathlib.Path`, or :class:`py:py._path.local.LocalPath` objects for the filepath_or_buffer argument. (:issue:`11033`) - The DataFrame and Series functions .to_csv(), .to_html() and .to_latex() can now handle paths beginning with tildes (e.g. ~/Documents/) (:issue:`11438`)

  • DataFrame now uses the fields of a namedtuple as columns, if columns are not supplied (:issue:`11181`)

  • DataFrame.itertuples() now returns namedtuple objects, when possible. (:issue:`11269`, :issue:`11625`)

  • Added axvlines_kwds to parallel coordinates plot (:issue:`10709`)

  • Option to .info() and .memory_usage() to provide for deep introspection of memory consumption. Note that this can be expensive to compute and therefore is an optional parameter. (:issue:`11595`)

    .. ipython:: python
    
       df = pd.DataFrame({"A": ["foo"] * 1000})  # noqa: F821
       df["B"] = df["A"].astype("category")
    
       # shows the '+' as we have object dtypes
       df.info()
    
       # we have an accurate memory assessment (but can be expensive to compute this)
       df.info(memory_usage="deep")
    
  • Index now has a fillna method (:issue:`10089`)

    .. ipython:: python
    
       pd.Index([1, np.nan, 3]).fillna(2)
    
  • Series of type category now make .str.<...> and .dt.<...> accessor methods / properties available, if the categories are of that type. (:issue:`10661`)

    .. ipython:: python
    
       s = pd.Series(list("aabb")).astype("category")
       s
       s.str.contains("a")
    
       date = pd.Series(pd.date_range("1/1/2015", periods=5)).astype("category")
       date
       date.dt.day
    
  • pivot_table now has a margins_name argument so you can use something other than the default of 'All' (:issue:`3335`)

  • Implement export of datetime64[ns, tz] dtypes with a fixed HDF5 store (:issue:`11411`)

  • Pretty printing sets (e.g. in DataFrame cells) now uses set literal syntax ({x, y}) instead of Legacy Python syntax (set([x, y])) (:issue:`11215`)

  • Improve the error message in :func:`pandas.io.gbq.to_gbq` when a streaming insert fails (:issue:`11285`) and when the DataFrame does not match the schema of the destination table (:issue:`11359`)

API changes

  • raise NotImplementedError in Index.shift for non-supported index types (:issue:`8038`)
  • min and max reductions on datetime64 and timedelta64 dtyped series now result in NaT and not nan (:issue:`11245`).
  • Indexing with a null key will raise a TypeError, instead of a ValueError (:issue:`11356`)
  • Series.ptp will now ignore missing values by default (:issue:`11163`)

Deprecations

  • The pandas.io.ga module which implements google-analytics support is deprecated and will be removed in a future version (:issue:`11308`)
  • Deprecate the engine keyword in .to_csv(), which will be removed in a future version (:issue:`11274`)

Performance improvements

  • Checking monotonic-ness before sorting on an index (:issue:`11080`)
  • Series.dropna performance improvement when its dtype can't contain NaN (:issue:`11159`)
  • Release the GIL on most datetime field operations (e.g. DatetimeIndex.year, Series.dt.year), normalization, and conversion to and from Period, DatetimeIndex.to_period and PeriodIndex.to_timestamp (:issue:`11263`)
  • Release the GIL on some rolling algos: rolling_median, rolling_mean, rolling_max, rolling_min, rolling_var, rolling_kurt, rolling_skew (:issue:`11450`)
  • Release the GIL when reading and parsing text files in read_csv, read_table (:issue:`11272`)
  • Improved performance of rolling_median (:issue:`11450`)
  • Improved performance of to_excel (:issue:`11352`)
  • Performance bug in repr of Categorical categories, which was rendering the strings before chopping them for display (:issue:`11305`)
  • Performance improvement in Categorical.remove_unused_categories, (:issue:`11643`).
  • Improved performance of Series constructor with no data and DatetimeIndex (:issue:`11433`)
  • Improved performance of shift, cumprod, and cumsum with groupby (:issue:`4095`)

Bug fixes

  • SparseArray.__iter__() now does not cause PendingDeprecationWarning in Python 3.5 (:issue:`11622`)
  • Regression from 0.16.2 for output formatting of long floats/nan, restored in (:issue:`11302`)
  • Series.sort_index() now correctly handles the inplace option (:issue:`11402`)
  • Incorrectly distributed .c file in the build on PyPi when reading a csv of floats and passing na_values=<a scalar> would show an exception (:issue:`11374`)
  • Bug in .to_latex() output broken when the index has a name (:issue:`10660`)
  • Bug in HDFStore.append with strings whose encoded length exceeded the max unencoded length (:issue:`11234`)
  • Bug in merging datetime64[ns, tz] dtypes (:issue:`11405`)
  • Bug in HDFStore.select when comparing with a numpy scalar in a where clause (:issue:`11283`)
  • Bug in using DataFrame.ix with a MultiIndex indexer (:issue:`11372`)
  • Bug in date_range with ambiguous endpoints (:issue:`11626`)
  • Prevent adding new attributes to the accessors .str, .dt and .cat. Retrieving such a value was not possible, so error out on setting it. (:issue:`10673`)
  • Bug in tz-conversions with an ambiguous time and .dt accessors (:issue:`11295`)
  • Bug in output formatting when using an index of ambiguous times (:issue:`11619`)
  • Bug in comparisons of Series vs list-likes (:issue:`11339`)
  • Bug in DataFrame.replace with a datetime64[ns, tz] and a non-compat to_replace (:issue:`11326`, :issue:`11153`)
  • Bug in isnull where numpy.datetime64('NaT') in a numpy.array was not determined to be null(:issue:`11206`)
  • Bug in list-like indexing with a mixed-integer Index (:issue:`11320`)
  • Bug in pivot_table with margins=True when indexes are of Categorical dtype (:issue:`10993`)
  • Bug in DataFrame.plot cannot use hex strings colors (:issue:`10299`)
  • Regression in DataFrame.drop_duplicates from 0.16.2, causing incorrect results on integer values (:issue:`11376`)
  • Bug in pd.eval where unary ops in a list error (:issue:`11235`)
  • Bug in squeeze() with zero length arrays (:issue:`11230`, :issue:`8999`)
  • Bug in describe() dropping column names for hierarchical indexes (:issue:`11517`)
  • Bug in DataFrame.pct_change() not propagating axis keyword on .fillna method (:issue:`11150`)
  • Bug in .to_csv() when a mix of integer and string column names are passed as the columns parameter (:issue:`11637`)
  • Bug in indexing with a range, (:issue:`11652`)
  • Bug in inference of numpy scalars and preserving dtype when setting columns (:issue:`11638`)
  • Bug in to_sql using unicode column names giving UnicodeEncodeError with (:issue:`11431`).
  • Fix regression in setting of xticks in plot (:issue:`11529`).
  • Bug in holiday.dates where observance rules could not be applied to holiday and doc enhancement (:issue:`11477`, :issue:`11533`)
  • Fix plotting issues when having plain Axes instances instead of SubplotAxes (:issue:`11520`, :issue:`11556`).
  • Bug in DataFrame.to_latex() produces an extra rule when header=False (:issue:`7124`)
  • Bug in df.groupby(...).apply(func) when a func returns a Series containing a new datetimelike column (:issue:`11324`)
  • Bug in pandas.json when file to load is big (:issue:`11344`)
  • Bugs in to_excel with duplicate columns (:issue:`11007`, :issue:`10982`, :issue:`10970`)
  • Fixed a bug that prevented the construction of an empty series of dtype datetime64[ns, tz] (:issue:`11245`).
  • Bug in read_excel with MultiIndex containing integers (:issue:`11317`)
  • Bug in to_excel with openpyxl 2.2+ and merging (:issue:`11408`)
  • Bug in DataFrame.to_dict() produces a np.datetime64 object instead of Timestamp when only datetime is present in data (:issue:`11327`)
  • Bug in DataFrame.corr() raises exception when computes Kendall correlation for DataFrames with boolean and not boolean columns (:issue:`11560`)
  • Bug in the link-time error caused by C inline functions on FreeBSD 10+ (with clang) (:issue:`10510`)
  • Bug in DataFrame.to_csv in passing through arguments for formatting MultiIndexes, including date_format (:issue:`7791`)
  • Bug in DataFrame.join() with how='right' producing a TypeError (:issue:`11519`)
  • Bug in Series.quantile with empty list results has Index with object dtype (:issue:`11588`)
  • Bug in pd.merge results in empty Int64Index rather than Index(dtype=object) when the merge result is empty (:issue:`11588`)
  • Bug in Categorical.remove_unused_categories when having NaN values (:issue:`11599`)
  • Bug in DataFrame.to_sparse() loses column names for MultiIndexes (:issue:`11600`)
  • Bug in DataFrame.round() with non-unique column index producing a Fatal Python error (:issue:`11611`)
  • Bug in DataFrame.round() with decimals being a non-unique indexed Series producing extra columns (:issue:`11618`)

Contributors

.. contributors:: v0.17.0..v0.17.1