Closed
Description
Attempting to animate a plot using a pandas Timestamp column results in an exception.
Using Python 3.6.8, plotly_express==0.4.1, plotly==4.1.0
Code to replicate:
import plotly_express as px
import pandas as pd
# create a dataframe with mock data
df = pd.DataFrame(
[
{"x": 1, "y": 1, "date": "2018-01-01"},
{"x": 2, "y": 1, "date": "2018-01-02"},
{"x": 3, "y": 1, "date": "2018-01-03"},
]
)
df["date"] = pd.to_datetime(df["date"])
df.head()
# date | x | y
# 2018-01-01 | 1 | 1
# 2018-01-02 | 2 | 1
# 2018-01-03 | 3 | 1
# attempt to plot
px.scatter(df, x="x", y="y", animation_frame="date")
Exception & stack trace:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Users\jmccain\AppData\Local\Programs\Python\Python36\lib\site-packages\plotly\express\_chart_types.py", line 52, in scatter
return make_figure(args=locals(), constructor=go.Scatter)
File "C:\Users\jmccain\AppData\Local\Programs\Python\Python36\lib\site-packages\plotly\express\_core.py", line 874, in make_figure
orders, sorted_group_names = get_orderings(args, grouper, grouped)
File "C:\Users\jmccain\AppData\Local\Programs\Python\Python36\lib\site-packages\plotly\express\_core.py", line 859, in get_orderings
key=lambda g: orders[col].index(g[i]) if g[i] in orders[col] else -1,
File "C:\Users\jmccain\AppData\Local\Programs\Python\Python36\lib\site-packages\plotly\express\_core.py", line 859, in <lambda>
key=lambda g: orders[col].index(g[i]) if g[i] in orders[col] else -1,
ValueError: Timestamp('2018-01-01 00:00:00') is not in list
I found the cause of the issue, in _core.py:847
Pandas .unique()
method returns a NumPy array, and as given in the examples section, it will convert a pandas.Timestamp
to a numpy.datetime64
.
>>> pd.Series([pd.Timestamp('2016-01-01') for _ in range(3)]).unique()
array(['2016-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
The uniques
list defined at _core.py:847 uses the .unique()
method, converting the series of pandas.Timestamp
into an array of numpy.datetime64
uniques = args["data_frame"][col].unique()
This distinction is relevant on line 857:
group_names = sorted(
group_names,
key=lambda g: orders[col].index(g[i]) if g[i] in orders[col] else -1
)
Which causes the ValueError when .index()
does not find a matching element due to this behavior of pandas.Timestamp and numpy.datetime64 seen here:
>>> pandas.Timestamp('2018-01-01 00:00:00') in [numpy.datetime64('2018-01-01T00:00:00.000000000')]
True
>>> [numpy.datetime64('2018-01-01T00:00:00.000000000')].index(pandas.Timestamp('2018-01-01 00:00:00'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Timestamp('2018-01-01 00:00:00') is not in list