Skip to content

Cannot Animate Plot by pandas.Timestamp #1737

Closed
@johnmccain

Description

@johnmccain

Attempting to animate a plot using a pandas Timestamp column results in an exception.

Using Python 3.6.8, plotly_express==0.4.1, plotly==4.1.0

Code to replicate:

import plotly_express as px
import pandas as pd

# create a dataframe with mock data
df = pd.DataFrame(
    [
        {"x": 1, "y": 1, "date": "2018-01-01"},
        {"x": 2, "y": 1, "date": "2018-01-02"},
        {"x": 3, "y": 1, "date": "2018-01-03"},
    ]
)
df["date"] = pd.to_datetime(df["date"])
df.head()

#       date | x | y
# 2018-01-01 | 1 | 1
# 2018-01-02 | 2 | 1
# 2018-01-03 | 3 | 1

# attempt to plot
px.scatter(df, x="x", y="y", animation_frame="date")

Exception & stack trace:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Users\jmccain\AppData\Local\Programs\Python\Python36\lib\site-packages\plotly\express\_chart_types.py", line 52, in scatter
    return make_figure(args=locals(), constructor=go.Scatter)
  File "C:\Users\jmccain\AppData\Local\Programs\Python\Python36\lib\site-packages\plotly\express\_core.py", line 874, in make_figure
    orders, sorted_group_names = get_orderings(args, grouper, grouped)
  File "C:\Users\jmccain\AppData\Local\Programs\Python\Python36\lib\site-packages\plotly\express\_core.py", line 859, in get_orderings
    key=lambda g: orders[col].index(g[i]) if g[i] in orders[col] else -1,
  File "C:\Users\jmccain\AppData\Local\Programs\Python\Python36\lib\site-packages\plotly\express\_core.py", line 859, in <lambda>
    key=lambda g: orders[col].index(g[i]) if g[i] in orders[col] else -1,
ValueError: Timestamp('2018-01-01 00:00:00') is not in list

I found the cause of the issue, in _core.py:847

Pandas .unique() method returns a NumPy array, and as given in the examples section, it will convert a pandas.Timestamp to a numpy.datetime64.

>>> pd.Series([pd.Timestamp('2016-01-01') for _ in range(3)]).unique()
array(['2016-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

The uniques list defined at _core.py:847 uses the .unique() method, converting the series of pandas.Timestamp into an array of numpy.datetime64

uniques = args["data_frame"][col].unique()

This distinction is relevant on line 857:

group_names = sorted(
    group_names,
    key=lambda g: orders[col].index(g[i]) if g[i] in orders[col] else -1
)

Which causes the ValueError when .index() does not find a matching element due to this behavior of pandas.Timestamp and numpy.datetime64 seen here:

>>> pandas.Timestamp('2018-01-01 00:00:00') in [numpy.datetime64('2018-01-01T00:00:00.000000000')]
True
>>> [numpy.datetime64('2018-01-01T00:00:00.000000000')].index(pandas.Timestamp('2018-01-01 00:00:00'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Timestamp('2018-01-01 00:00:00') is not in list

Metadata

Metadata

Labels

bugsomething broken

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions