jupyter | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Plotly Express operates on "tidy" or "long" data rather than "wide" data. You may pass data in either as a Pandas DataFrame
objects or as individual array-like objects which px
will assemble into a data frame internally, such as lists, numpy
arrays or Pandas Series
objects.
What follows is a very short example of the difference between wide and tidy/long data, and the excellent Tidy Data in Python blog post contains much more information about the tidy approach to structuring data.
import pandas as pd
print("This is 'wide' data, unsuitable as-is for Plotly Express:")
wide_df = pd.DataFrame(dict(Month=["Jan", "Feb", "Mar"], London=[1,2,3], Paris=[3,1,2]))
wide_df
import pandas as pd
print("This is the same data in 'long' format, ready for Plotly Express:")
wide_df = pd.DataFrame(dict(Month=["Jan", "Feb", "Mar"], London=[1,2,3], Paris=[3,1,2]))
tidy_df = wide_df.melt(id_vars="Month")
import plotly.express as px
import pandas as pd
wide_df = pd.DataFrame(dict(Month=["Jan", "Feb", "Mar"], London=[1,2,3], Paris=[3,1,2]))
tidy_df = wide_df.melt(id_vars="Month")
fig = px.bar(tidy_df, x="Month", y="value", color="variable", barmode="group")
fig.show()
px
functions supports natively pandas DataFrame. Arguments can either be passed as dataframe columns, or as column names if the data_frame
argument is provided.
import plotly.express as px
iris = px.data.iris()
# Use directly Columns as argument. You can use tab completion for this!
fig = px.scatter(iris, x=iris.sepal_length, y=iris.sepal_width, color=iris.species, size=iris.petal_length)
fig.show()
import plotly.express as px
iris = px.data.iris()
# Use column names instead. This is the same chart as above.
fig = px.scatter(iris, x='sepal_length', y='sepal_width', color='species', size='petal_length')
fig.show()
In addition to columns, it is also possible to pass the index of a DataFrame as argument. In the example below the index is displayed in the hover data.
import plotly.express as px
iris = px.data.iris()
fig = px.scatter(iris, x=iris.sepal_length, y=iris.sepal_width, size=iris.petal_length,
hover_data=[iris.index])
fig.show()
In the addition to columns from the data_frame
argument, one may also pass columns from a different DataFrame, as long as all columns have the same length. It is also possible to pass columns without passing the data_frame
argument.
However, column names are used only if they correspond to columns in the data_frame
argument, in other cases, the name of the keyword argument is used. As explained below, the labels
argument can be used to set names.
import plotly.express as px
import pandas as pd
df1 = pd.DataFrame(dict(time=[10, 20, 30], sales=[10, 8, 30]))
df2 = pd.DataFrame(dict(market=[4, 2, 5]))
fig = px.bar(df1, x=df1.time, y=df2.market, color=df1.sales)
fig.show()
The labels
argument can be used to override the names used for axis titles, legend entries and hovers.
import plotly.express as px
import pandas as pd
gapminder = px.data.gapminder()
gdp = gapminder['pop'] * gapminder['gdpPercap']
fig = px.bar(gapminder, x='year', y=gdp, color='continent', labels={'y':'gdp'},
hover_data=['country'],
title='Evolution of world GDP')
fig.show()
px
arguments can also be array-like objects such as lists, NumPy arrays.
import plotly.express as px
# List arguments
fig = px.line(x=[1, 2, 3, 4], y=[3, 5, 4, 8])
fig.show()
import numpy as np
import plotly.express as px
t = np.linspace(0, 10, 100)
# NumPy arrays arguments
fig = px.scatter(x=t, y=np.sin(t), labels={'x':'t', 'y':'sin(t)'}) # override keyword names with labels
fig.show()
The column-based argument data_frame
can also be passed with a dict
or array
. Using a dictionary can be a convenient way to pass column names used in axis titles, legend entries and hovers without creating a pandas DataFrame.
import plotly.express as px
import numpy as np
N = 10000
np.random.seed(0)
fig = px.density_contour(dict(effect_size=5 + np.random.randn(N),
waiting_time=np.random.poisson(size=N)),
x="effect_size", y="waiting_time")
fig.show()
When the data_frame
argument is a NumPy array, column names are integer corresponding to the columns of the array. In this case, keyword names are used in axis, legend and hovers. This is also the case for a pandas DataFrame with integer column names. Use the labels
argument to override these names.
import numpy as np
import plotly.express as px
ar = np.arange(100).reshape((10, 10))
fig = px.scatter(ar, x=2, y=6, size=1, color=5)
fig.show()
It is possible to mix DataFrame columns, NumPy arrays and lists as arguments. Remember that the only column names to be used correspond to columns in the data_frame
argument, use labels
to override names displayed in axis titles, legend entries or hovers.
import plotly.express as px
import numpy as np
import pandas as pd
gapminder = px.data.gapminder()
gdp = np.log(gapminder['pop'] * gapminder['gdpPercap']) # NumPy array
fig = px.bar(gapminder, x='year', y=gdp, color='continent', labels={'y':'log gdp'},
hover_data=['country'],
title='Evolution of world GDP')
fig.show()