Skip to content
Permalink

Comparing changes

This is a direct comparison between two commits made in this repository or its related repositories. View the default comparison for this range or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: pandas-dev/pandas
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: f214c4fa518f384a1cbb425205a9d503d23b216c
Choose a base ref
..
head repository: pandas-dev/pandas
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: ad6351b0de5d055d1681eaeb10b15534d99ed3d6
Choose a head ref
Showing with 499 additions and 279 deletions.
  1. +8 −6 doc/source/user_guide/io.rst
  2. +12 −6 doc/source/whatsnew/v0.25.1.rst
  3. +0 −6 doc/source/whatsnew/v0.7.3.rst
  4. +2 −0 doc/source/whatsnew/v1.0.0.rst
  5. +1 −1 pandas/_libs/hashtable.pyx
  6. +5 −3 pandas/_libs/parsers.pyx
  7. +30 −0 pandas/compat/__init__.py
  8. +0 −6 pandas/compat/chainmap.py
  9. +2 −2 pandas/core/arrays/categorical.py
  10. +2 −2 pandas/core/arrays/datetimes.py
  11. +5 −3 pandas/core/computation/align.py
  12. +0 −5 pandas/core/computation/common.py
  13. +8 −6 pandas/core/computation/engines.py
  14. +2 −1 pandas/core/computation/expr.py
  15. +13 −14 pandas/core/computation/expressions.py
  16. +4 −4 pandas/core/computation/ops.py
  17. +22 −15 pandas/core/computation/scope.py
  18. +1 −1 pandas/core/frame.py
  19. +46 −46 pandas/core/generic.py
  20. +1 −1 pandas/core/indexes/base.py
  21. +5 −118 pandas/core/ops/__init__.py
  22. +127 −0 pandas/core/ops/array_ops.py
  23. +1 −1 pandas/core/series.py
  24. +4 −2 pandas/io/common.py
  25. +11 −0 pandas/io/parsers.py
  26. +15 −15 pandas/plotting/_core.py
  27. +12 −4 pandas/plotting/_matplotlib/core.py
  28. +4 −2 pandas/tests/arrays/categorical/test_missing.py
  29. +5 −0 pandas/tests/groupby/test_categorical.py
  30. +29 −0 pandas/tests/io/parser/test_header.py
  31. +32 −0 pandas/tests/io/test_compression.py
  32. +4 −3 pandas/tests/io/test_pickle.py
  33. +22 −0 pandas/tests/plotting/common.py
  34. +25 −0 pandas/tests/plotting/test_frame.py
  35. +12 −0 pandas/tests/reshape/merge/test_merge.py
  36. +22 −0 pandas/tests/series/test_missing.py
  37. +5 −6 pandas/util/testing.py
14 changes: 8 additions & 6 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
@@ -28,6 +28,7 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
:delim: ;

text;`CSV <https://en.wikipedia.org/wiki/Comma-separated_values>`__;:ref:`read_csv<io.read_csv_table>`;:ref:`to_csv<io.store_in_csv>`
text;`TXT <https://www.oracle.com/webfolder/technetwork/data-quality/edqhelp/Content/introduction/getting_started/configuring_fixed_width_text_file_formats.htm>`__;:ref:`read_fwf<io.fwf_reader>`
text;`JSON <https://www.json.org/>`__;:ref:`read_json<io.json_reader>`;:ref:`to_json<io.json_writer>`
text;`HTML <https://en.wikipedia.org/wiki/HTML>`__;:ref:`read_html<io.read_html>`;:ref:`to_html<io.html>`
text; Local clipboard;:ref:`read_clipboard<io.clipboard>`;:ref:`to_clipboard<io.clipboard>`
@@ -1372,6 +1373,7 @@ should pass the ``escapechar`` option:
print(data)
pd.read_csv(StringIO(data), escapechar='\\')
.. _io.fwf_reader:
.. _io.fwf:

Files with fixed width columns
@@ -3572,7 +3574,7 @@ Closing a Store and using a context manager:
Read/write API
''''''''''''''

``HDFStore`` supports an top-level API using ``read_hdf`` for reading and ``to_hdf`` for writing,
``HDFStore`` supports a top-level API using ``read_hdf`` for reading and ``to_hdf`` for writing,
similar to how ``read_csv`` and ``to_csv`` work.

.. ipython:: python
@@ -3687,7 +3689,7 @@ Hierarchical keys
Keys to a store can be specified as a string. These can be in a
hierarchical path-name like format (e.g. ``foo/bar/bah``), which will
generate a hierarchy of sub-stores (or ``Groups`` in PyTables
parlance). Keys can be specified with out the leading '/' and are **always**
parlance). Keys can be specified without the leading '/' and are **always**
absolute (e.g. 'foo' refers to '/foo'). Removal operations can remove
everything in the sub-store and **below**, so be *careful*.

@@ -3825,7 +3827,7 @@ data.

A query is specified using the ``Term`` class under the hood, as a boolean expression.

* ``index`` and ``columns`` are supported indexers of a ``DataFrames``.
* ``index`` and ``columns`` are supported indexers of ``DataFrames``.
* if ``data_columns`` are specified, these can be used as additional indexers.

Valid comparison operators are:
@@ -3917,7 +3919,7 @@ Use boolean expressions, with in-line function evaluation.
store.select('dfq', "index>pd.Timestamp('20130104') & columns=['A', 'B']")
Use and inline column reference
Use inline column reference.

.. ipython:: python
@@ -4593,8 +4595,8 @@ Performance
write chunksize (default is 50000). This will significantly lower
your memory usage on writing.
* You can pass ``expectedrows=<int>`` to the first ``append``,
to set the TOTAL number of expected rows that ``PyTables`` will
expected. This will optimize read/write performance.
to set the TOTAL number of rows that ``PyTables`` will expect.
This will optimize read/write performance.
* Duplicate rows can be written to tables, but are filtered out in
selection (with the last items being selected; thus a table is
unique on major, minor pairs)
18 changes: 12 additions & 6 deletions doc/source/whatsnew/v0.25.1.rst
Original file line number Diff line number Diff line change
@@ -25,8 +25,7 @@ Bug fixes
Categorical
^^^^^^^^^^^

-
-
- Bug in :meth:`Categorical.fillna` would replace all values, not just those that are ``NaN`` (:issue:`26215`)
-

Datetimelike
@@ -103,9 +102,8 @@ MultiIndex

I/O
^^^

- Avoid calling ``S3File.s3`` when reading parquet, as this was removed in s3fs version 0.3.0 (:issue:`27756`)
-
- Better error message when a negative header is passed in :func:`pandas.read_csv` (:issue:`27779`)
-

Plotting
@@ -127,9 +125,9 @@ Reshaping
^^^^^^^^^

- A ``KeyError`` is now raised if ``.unstack()`` is called on a :class:`Series` or :class:`DataFrame` with a flat :class:`Index` passing a name which is not the correct one (:issue:`18303`)
- Bug in :meth:`DataFrame.crosstab` when ``margins`` set to ``True`` and ``normalize`` is not ``False``, an error is raised. (:issue:`27500`)
- Bug in :meth:`DataFrame.crosstab` when ``margins`` set to ``True`` and ``normalize`` is not ``False``, an error is raised. (:issue:`27500`)
- :meth:`DataFrame.join` now suppresses the ``FutureWarning`` when the sort parameter is specified (:issue:`21952`)
-
- Bug in :meth:`DataFrame.join` raising with readonly arrays (:issue:`27943`)

Sparse
^^^^^^
@@ -160,6 +158,14 @@ Other
-
-

I/O and LZMA
~~~~~~~~~~~~

Some users may unknowingly have an incomplete Python installation, which lacks the `lzma` module from the standard library. In this case, `import pandas` failed due to an `ImportError` (:issue: `27575`).
Pandas will now warn, rather than raising an `ImportError` if the `lzma` module is not present. Any subsequent attempt to use `lzma` methods will raise a `RuntimeError`.
A possible fix for the lack of the `lzma` module is to ensure you have the necessary libraries and then re-install Python.
For example, on MacOS installing Python with `pyenv` may lead to an incomplete Python installation due to unmet system dependencies at compilation time (like `xz`). Compilation will succeed, but Python might fail at run time. The issue can be solved by installing the necessary dependencies and then re-installing Python.

.. _whatsnew_0.251.contributors:

Contributors
6 changes: 0 additions & 6 deletions doc/source/whatsnew/v0.7.3.rst
Original file line number Diff line number Diff line change
@@ -25,8 +25,6 @@ New features
from pandas.tools.plotting import scatter_matrix
scatter_matrix(df, alpha=0.2) # noqa F821
.. image:: ../savefig/scatter_matrix_kde.png
:width: 5in
- Add ``stacked`` argument to Series and DataFrame's ``plot`` method for
:ref:`stacked bar plots <visualization.barplot>`.
@@ -35,15 +33,11 @@ New features
df.plot(kind='bar', stacked=True) # noqa F821
.. image:: ../savefig/bar_plot_stacked_ex.png
:width: 4in
.. code-block:: python
df.plot(kind='barh', stacked=True) # noqa F821
.. image:: ../savefig/barh_plot_stacked_ex.png
:width: 4in
- Add log x and y :ref:`scaling options <visualization.basic>` to
``DataFrame.plot`` and ``Series.plot``
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
@@ -87,6 +87,7 @@ Bug fixes
Categorical
^^^^^^^^^^^

- Added test to assert the :func:`fillna` raises the correct ValueError message when the value isn't a value from categories (:issue:`13628`)
-
-

@@ -165,6 +166,7 @@ Plotting

- Bug in :meth:`Series.plot` not able to plot boolean values (:issue:`23719`)
-
- Bug in :meth:`DataFrame.plot` producing incorrect legend markers when plotting multiple series on the same axis (:issue:`18222`)
- Bug in :meth:`DataFrame.plot` when ``kind='box'`` and data contains datetime or timedelta data. These types are now automatically dropped (:issue:`22799`)

Groupby/resample/rolling
2 changes: 1 addition & 1 deletion pandas/_libs/hashtable.pyx
Original file line number Diff line number Diff line change
@@ -108,7 +108,7 @@ cdef class Int64Factorizer:
def get_count(self):
return self.count

def factorize(self, int64_t[:] values, sort=False,
def factorize(self, const int64_t[:] values, sort=False,
na_sentinel=-1, na_value=None):
"""
Factorize values with nans replaced by na_sentinel
8 changes: 5 additions & 3 deletions pandas/_libs/parsers.pyx
Original file line number Diff line number Diff line change
@@ -2,7 +2,6 @@
# See LICENSE for the license
import bz2
import gzip
import lzma
import os
import sys
import time
@@ -59,9 +58,12 @@ from pandas.core.arrays import Categorical
from pandas.core.dtypes.concat import union_categoricals
import pandas.io.common as icom

from pandas.compat import _import_lzma, _get_lzma_file
from pandas.errors import (ParserError, DtypeWarning,
EmptyDataError, ParserWarning)

lzma = _import_lzma()

# Import CParserError as alias of ParserError for backwards compatibility.
# Ultimately, we want to remove this import. See gh-12665 and gh-14479.
CParserError = ParserError
@@ -645,9 +647,9 @@ cdef class TextReader:
'zip file %s', str(zip_names))
elif self.compression == 'xz':
if isinstance(source, str):
source = lzma.LZMAFile(source, 'rb')
source = _get_lzma_file(lzma)(source, 'rb')
else:
source = lzma.LZMAFile(filename=source)
source = _get_lzma_file(lzma)(filename=source)
else:
raise ValueError('Unrecognized compression type: %s' %
self.compression)
30 changes: 30 additions & 0 deletions pandas/compat/__init__.py
Original file line number Diff line number Diff line change
@@ -10,6 +10,7 @@
import platform
import struct
import sys
import warnings

PY35 = sys.version_info[:2] == (3, 5)
PY36 = sys.version_info >= (3, 6)
@@ -65,3 +66,32 @@ def is_platform_mac():

def is_platform_32bit():
return struct.calcsize("P") * 8 < 64


def _import_lzma():
"""Attempts to import lzma, warning the user when lzma is not available.
"""
try:
import lzma

return lzma
except ImportError:
msg = (
"Could not import the lzma module. "
"Your installed Python is incomplete. "
"Attempting to use lzma compression will result in a RuntimeError."
)
warnings.warn(msg)


def _get_lzma_file(lzma):
"""Returns the lzma method LZMAFile when the module was correctly imported.
Otherwise, raises a RuntimeError.
"""
if lzma is None:
raise RuntimeError(
"lzma module not available. "
"A Python re-install with the proper "
"dependencies might be required to solve this issue."
)
return lzma.LZMAFile
6 changes: 0 additions & 6 deletions pandas/compat/chainmap.py
Original file line number Diff line number Diff line change
@@ -15,9 +15,3 @@ def __delitem__(self, key):
del mapping[key]
return
raise KeyError(key)

# override because the m parameter is introduced in Python 3.4
def new_child(self, m=None):
if m is None:
m = {}
return self.__class__(m, *self.maps)
4 changes: 2 additions & 2 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
@@ -1840,8 +1840,8 @@ def fillna(self, value=None, method=None, limit=None):
raise ValueError("fill value must be in categories")

values_codes = _get_codes_for_values(value, self.categories)
indexer = np.where(values_codes != -1)
codes[indexer] = values_codes[values_codes != -1]
indexer = np.where(codes == -1)
codes[indexer] = values_codes[indexer]

# If value is not a dict or Series it should be a scalar
elif is_hashable(value):
4 changes: 2 additions & 2 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
@@ -195,11 +195,11 @@ def wrapper(self, other):
return invalid_comparison(self, other, op)

if is_object_dtype(other):
# We have to use _comp_method_OBJECT_ARRAY instead of numpy
# We have to use comp_method_OBJECT_ARRAY instead of numpy
# comparison otherwise it would fail to raise when
# comparing tz-aware and tz-naive
with np.errstate(all="ignore"):
result = ops._comp_method_OBJECT_ARRAY(
result = ops.comp_method_OBJECT_ARRAY(
op, self.astype(object), other
)
o_mask = isna(other)
8 changes: 5 additions & 3 deletions pandas/core/computation/align.py
Original file line number Diff line number Diff line change
@@ -9,6 +9,7 @@
from pandas.errors import PerformanceWarning

import pandas as pd
from pandas.core.base import PandasObject
import pandas.core.common as com
from pandas.core.computation.common import _result_type_many

@@ -34,7 +35,7 @@ def _zip_axes_from_type(typ, new_axes):

def _any_pandas_objects(terms):
"""Check a sequence of terms for instances of PandasObject."""
return any(isinstance(term.value, pd.core.generic.PandasObject) for term in terms)
return any(isinstance(term.value, PandasObject) for term in terms)


def _filter_special_cases(f):
@@ -132,7 +133,8 @@ def _align(terms):


def _reconstruct_object(typ, obj, axes, dtype):
"""Reconstruct an object given its type, raw value, and possibly empty
"""
Reconstruct an object given its type, raw value, and possibly empty
(None) axes.
Parameters
@@ -157,7 +159,7 @@ def _reconstruct_object(typ, obj, axes, dtype):

res_t = np.result_type(obj.dtype, dtype)

if not isinstance(typ, partial) and issubclass(typ, pd.core.generic.PandasObject):
if not isinstance(typ, partial) and issubclass(typ, PandasObject):
return typ(obj, dtype=res_t, **axes)

# special case for pathological things like ~True/~False
5 changes: 0 additions & 5 deletions pandas/core/computation/common.py
Original file line number Diff line number Diff line change
@@ -36,8 +36,3 @@ def _remove_spaces_column_name(name):

class NameResolutionError(NameError):
pass


class StringMixin:
# TODO: delete this class. Removing this ATM caused a failure.
pass
14 changes: 8 additions & 6 deletions pandas/core/computation/engines.py
Original file line number Diff line number Diff line change
@@ -17,7 +17,8 @@ class NumExprClobberingError(NameError):


def _check_ne_builtin_clash(expr):
"""Attempt to prevent foot-shooting in a helpful way.
"""
Attempt to prevent foot-shooting in a helpful way.
Parameters
----------
@@ -53,7 +54,8 @@ def convert(self):
return printing.pprint_thing(self.expr)

def evaluate(self):
"""Run the engine on the expression
"""
Run the engine on the expression.
This method performs alignment which is necessary no matter what engine
is being used, thus its implementation is in the base class.
@@ -78,7 +80,8 @@ def _is_aligned(self):

@abc.abstractmethod
def _evaluate(self):
"""Return an evaluated expression.
"""
Return an evaluated expression.
Parameters
----------
@@ -94,7 +97,6 @@ def _evaluate(self):


class NumExprEngine(AbstractEngine):

"""NumExpr engine class"""

has_neg_frac = True
@@ -127,8 +129,8 @@ def _evaluate(self):


class PythonEngine(AbstractEngine):

"""Evaluate an expression in Python space.
"""
Evaluate an expression in Python space.
Mostly for testing purposes.
"""
3 changes: 2 additions & 1 deletion pandas/core/computation/expr.py
Original file line number Diff line number Diff line change
@@ -41,7 +41,8 @@


def tokenize_string(source):
"""Tokenize a Python source code string.
"""
Tokenize a Python source code string.
Parameters
----------
Loading