The plotly Python library (plotly.py) is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases.
plotly.py supports static image export using the to_image and write_image functions in the plotly.io package. This functionality requires the installation of the plotly orca command line utility and the psutil and requests Python packages.
Note: The requests library is used to communicate between the Python process and a local orca server process, it is not used to communicate with any external services.
These dependencies can all be installed using conda:
Some plotly.py features rely on fairly large geographic shape files. The county choropleth figure factory is one such example. These shape files are distributed as a separate plotly-geo package. This package can be installed using conda.
The chart-studio package can be used to upload plotly figures to Plotly’s Chart Studio Cloud or On-Prem services. This package can be installed using conda.
CLI
1
conda install -c plotly chart-studio
Note: This package is optional, and if it is not installed it is not possible for figures to be uploaded to the Chart Studio cloud service.
plotly has the similar concepts - Layer as ggplot.
Like this:
Data: Data must be data.frame.
Aesthetics: Aesthetics is used to indicate x and y variables. It can also be used to control the color, the size or the shape of points, the height of bars, or etc.
Geometric Objects: A layer combines data, aesthetic mapping, a geom (geometric object), a stat (statistical transformation), and a position adjustment. Typically, you will create layers using a geom_ function, overriding the default position and stat if needed.
Basically: Data: Assume your data are two points, for example, point $(0, 0)$ and $(1, 1)$. You need to tell the ggplot your data source.
Aesthetics: Then you need to define the x variables and y variables.
Geometric Objects: Tell ggplot what kind of plot - the object, you need. Let’s draw it on your paper.
R
1 2 3 4 5 6 7
# Define data frame two_points <- data.frame(x_value =c(0,1), y_value =c(0,1))# Create your data two points (0, 0) and (1, 1)
# Draw two points ggplot(data = two_points,# Data: tell ggplot the data source. aes(x = x_value, y = y_value))+# Aesthetics: Define x variables and y variables. geom_point()# Geometric Objects: Tell ggplot what kind of plot you need.
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
import pandas as pd import numpy as py import plotly.express as px
# Define data as dictionary d = {'x_value': [0, 1], 'y_value': [0, 1]}
# Convert dict to data frame two_points = pd.DataFrame(data = d)
# Draw two points px.scatter( # Geometric Objects in ggplot: Tell ggplot what kind of plot you need. two_points, # Data in ggplot: tell ggplot the data source. x = 'x_value', y = 'y_value'# Aesthetics in ggplot: Define x variables and y variables. )
In comparison with the R code, we can see that they have the same concepts. Additionally, plotly is based on matplolib. Therefore, we are used to using a variable fig to inherit our plot value. And using show() function to show the picture:
Python
1 2 3
# Draw two points fig = px.scatter(two_points, x = 'x_value', y = 'y_value') fig.show()
In order connecting the two points, we should add a line. In R, we just draw a line from a point to the other one:
R
1 2 3 4
# Connect two points ggplot(data = two_points,# Data: tell ggplot the data source. aes(x = x_value, y = y_value))+# Aesthetics: Define x variables and y variables. geom_point()+ geom_line()# Geometric Objects: This time we connect the two points.
However, in plotly we cannot do it since it is actually not a type point; instead, it is a type scatter. So the only thing we need to do is to change the chart type from scatter to line:
Python
1 2 3
# Connect two points fig = px.line(two_points, x = 'x_value', y = 'y_value') # Change the chart type fig.show()
Data format and preparation
The data set mtcars is used in the examples below:
Data profile:
mtcars : Motor Trend Car Road Tests.
Description: The data comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973 - 74 models).
Format: A data frame with 32 observations on 3 variables.
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] wt Weight (lb/1000)
R
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# Load the data data(mtcars) df <- mtcars[,c("mpg","cyl","wt")]# Extract all of the rows; extract columns: "mpg", "cyl", "wt".
# Convert cyl to a factor variable df$cyl <- as.factor(df$cyl) head(df)# show the first 6 rows of data
The R code below creates basic scatter plots using the argument geom = “point”. It’s also possible to combine different geoms (e.g.: geom = c(“point”, “smooth”)).
R Scatter
1 2
# Basic scatter plot qplot(x = mpg, y = wt, data = df, geom ="point")
R
1 2 3
# Scatter plot with smoothed line qplot(mpg, wt, data = df, geom =c("point","smooth"))
Python Scatter
1 2 3 4 5
# Basic scatter plot import plotly.express as px
fig = px.scatter(df, x = 'mpg', y = 'wt') fig.show()
Python Scatter
1 2 3 4
import plotly.express as px
fig = px.scatter(df, x = 'mpg', y = 'wt', trendline = 'lowess') fig.show()
The following R code will change the color and the shape of points by groups. The column cyl will be used as grouping variable. In other words, the color and the shape of points will be changed by the levels of cyl.
R Scatter
1
qplot(mpg, wt, data = df, color = cyl, shape = cyl)
Python Scatter
1 2 3 4
import plotly.express as px
fig = px.scatter(df, x = 'mpg', y = 'wt', color = 'cyl') fig.show()
Box plot, violin plot and dot plot - Compare with ggplot2
The R code below generates some data containing the weights by sex (M for male; F for female):
R
1 2 3 4 5 6 7 8 9 10 11 12 13 14
# Define Data Frame wdata = data.frame( sex = factor(rep(c("F","M"), each=200)), weight =c(rnorm(200,55), rnorm(200,58)))
head(wdata)
# sex weight # 1 F 53.8 # 2 F 55.3 # 3 F 56.1 # 4 F 52.7 # 5 F 55.4 # 6 F 55.5
fig = px.box(wdata, x = 'Sex', y = 'Weight', color = 'Sex') fig.show()
Python Violin plot
1 2
fig = px.violin(wdata, x = 'Sex', y = 'Weight', color = 'Sex') fig.show()
Python Violin plot with Points
1 2
fig = px.violin(wdata, x = 'Sex', y = 'Weight', color = 'Sex', points = 'all') fig.show()
Histogram and density plots - Compare with ggplot2
The histogram and density plots are used to display the distribution of data.
R Histogram
1 2 3 4
# Histogram plot # Change histogram fill color by group (sex) qplot(weight, data = wdata, geom ="histogram", fill = sex)
Density Plot
1 2 3 4 5
# Density plot # Change density plot line color by group (sex) # change line type qplot(weight, data = wdata, geom ="density", color = sex, linetype = sex)
Python Histogram
1 2
fig = px.histogram(wdata, 'Weight', color = 'Sex') fig.show()
In a scatter plot, each row of data_frame is represented by a symbol mark in 2D space.
Python
1 2 3 4
import plotly.express as px
print(px.data.iris.__doc__) px.data.iris().head()
Python
1 2 3 4 5
import plotly.express as px
df = px.data.iris() fig = px.scatter(df, x = "sepal_width", y = "sepal_length") fig.show()
Python
1 2 3 4 5
import plotly.express as px
df = px.data.iris() fig = px.scatter(df, x = "sepal_width", y = "sepal_length", color = "species") fig.show()
color (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign color to marks.
Python
1 2 3 4 5
import plotly.express as px
df = px.data.iris() fig = px.scatter(df, x = "sepal_width", y = "sepal_length", color = "species", marginal_y = "rug", marginal_x = "histogram") fig.show()
marginal_x (str) – One of 'rug', 'box', 'violin', or 'histogram`’. If set, a vertical subplot is drawn to the right of the main plot, visualizing the x-distribution.
Python
1 2 3 4 5
import plotly.express as px
df = px.data.iris() fig = px.scatter(df, x = "sepal_width", y = "sepal_length", color = "species", marginal_y = "violin", marginal_x = "box", trendline = "ols") fig.show()
trendline (str) – One of ‘ols‘ or ‘lowess‘. If ‘ols‘, an Ordinary Least Squares regression line will be drawn for each discrete-color/symbol group. If ‘lowess’, a Locally Weighted Scatterplot Smoothing line will be drawn for each discrete-color/symbol group.
Python
1 2 3 4 5 6
import plotly.express as px
df = px.data.iris() df["e"] = df["sepal_width"]/100 fig = px.scatter(df, x = "sepal_width", y = "sepal_length", color = "species", error_x = "e", error_y = "e") fig.show()
error_x (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to size x-axis error bars. If error_x_minus is None, error bars will be symmetrical, otherwise error_x is used for the positive direction only.
facet_row (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign marks to facetted subplots in the vertical direction.
facet_col (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign marks to facetted subplots in the horizontal direction.
category_orders (dict with str keys and list of str values (default {})) – By default, in Python 3.6+, the order of categorical values in axes, legends and facets depends on the order in which these values are first encountered in data_frame (and no order is guaranteed by default in Python below 3.6). This parameter is used to force a specific ordering of values per column. The keys of this dict should correspond to column names, and the values should be lists of strings corresponding to the specific display order desired.
Python
1 2 3 4 5 6 7
import plotly.express as px
df = px.data.tips() fig = px.scatter(df, x = "total_bill", y = "tip", color = "size", facet_col = "sex", color_continuous_scale = px.colors.sequential.Viridis, render_mode = "webgl") fig.show()
render_mode (str) – One of ‘auto‘, ‘svg‘ or ‘webgl‘, default ‘auto‘ Controls the browser API used to draw marks. ‘svg’ is appropriate for figures of less than 1000 data points, and will allow for fully-vectorized output. ‘webgl‘ is likely necessary for acceptable performance above 1000 points but rasterizes part of the output. ‘auto‘ uses heuristics to choose the mode.
Python
1 2 3 4
import plotly.express as px
df = px.data.gapminder() df.head()
Python
1
df[df["year"] == 2007]
Python
1 2 3 4 5 6
import plotly.express as px
df = px.data.gapminder() fig = px.scatter(df.query("year == 2007"), x = "gdpPercap", y = "lifeExp", size = "pop", color = "continent", hover_name = "country", log_x = True, size_max = 60) fig.show()
hover_name (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like appear in bold in the hover tooltip.
log_x (boolean (default False)) – If True, the x-axis is log-scaled in cartesian coordinates.
animation_frame (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign marks to animation frames.
animation_group (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to provide object-constancy across animation frames: rows with matching animation_group‘s will be treated as if they describe the same object in each frame.
range_x (list of two numbers) – If provided, overrides auto-scaling on the x-axis in cartesian coordinates.
In a 2D line plot, each row of data_frame is represented as vertex of a polyline mark in 2D space.
Python
1 2 3 4 5 6
import plotly.express as px
df = px.data.gapminder() fig = px.line(df, x = "year", y = "lifeExp", color = "continent", line_group = "country", hover_name = "country", line_shape = "spline", render_mode = "svg") fig.show()
line_group (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to group rows of data_frame into lines.
line_shape (str (default ‘linear‘)) – One of ‘linear‘ or ‘spline‘.
In a stacked area plot, each row of data_frame is represented as vertex of a polyline mark in 2D space. The area between successive polylines is filled.
Python
1 2 3 4 5
import plotly.express as px
df = px.data.gapminder() fig = px.area(df, x = "year", y = "pop", color = "continent", line_group = "country") fig.show()
In a scatter plot matrix (or SPLOM), each row of data_frame is represented by a multiple symbol marks, one in each cell of a grid of 2D scatter plots, which plot each pair of dimensions against each other.
dimensions (list of str or int, or Series or array-like) – Either names of columns in data_frame, or pandas Series, or array_like objects Values from these columns are used for multidimensional visualization.
In a parallel coordinates plot, each row of data_frame is represented by a polyline mark which traverses a set of parallel axes, one for each of the dimensions.
labels (dict with str keys and str values (default {})) – By default, column names are used in the figure for axis titles, legend entries and hovers. This parameter allows this to be overridden. The keys of this dict should correspond to column names, and the values should correspond to the desired label to be displayed.
color_continuous_scale (list of str) – Strings should define valid CSS-colors This list is used to build a continuous color scale when the column denoted by color contains numeric data. Various useful color scales are available in the plotly.express.colors submodules, specifically plotly.express.colors.sequential, plotly.express.colors.diverging and plotly.express.colors.cyclical.
color_continuous_midpoint (number (default None)) – If set, computes the bounds of the continuous color scale to have the desired midpoint. Setting this value is recommended when using plotly.express.colors.diverging color scales as the inputs to color_continuous_scale.
In a parallel categories (or parallel sets) plot, each row of data_frame is grouped with other rows that share the same values of dimensions and then plotted as a polyline mark through a set of parallel axes, one for each of the dimensions.
In a density contour plot, rows of data_frame are grouped together into contour marks to visualize the 2D distribution of an aggregate function histfunc (e.g. the count or sum) of the value z.
Python
1 2 3 4
import plotly.express as px
df = px.data.iris() df.head()
Python
1 2 3 4 5
import plotly.express as px
df = px.data.iris() fig = px.density_contour(df, x = "sepal_width", y = "sepal_length") fig.show()
Python
1 2 3 4 5
import plotly.express as px
df = px.data.iris() fig = px.density_contour(df, x = "sepal_width", y = "sepal_length", color = "species", marginal_x = "rug", marginal_y = "histogram") fig.show()
In a density heatmap, rows of data_frame are grouped together into colored rectangular tiles to visualize the 2D distribution of an aggregate function histfunc (e.g. the count or sum) of the value z.
Python
1 2 3 4 5
import plotly.express as px
df = px.data.iris() fig = px.density_heatmap(df, x = "sepal_width", y = "sepal_length", marginal_x = "rug", marginal_y = "histogram") fig.show()
In a histogram, rows of data_frame are grouped together into a rectangular mark to visualize the 1D distribution of an aggregate function histfunc (e.g. the count or sum) of the value y (or x if orientation is ‘h‘).
Python
1 2 3 4 5
import plotly.express as px
df = px.data.tips() fig = px.histogram(df, x = "total_bill", y = "tip", color = "sex", marginal = "rug", hover_data = df.columns) fig.show()
hover_data (list of str or int, or Series or array-like) – Either names of columns in data_frame, or pandas Series, or array_like objects Values from these columns appear as extra data in the hover tooltip.
histfunc (str (default ‘count‘)) – One of ‘count‘, ‘sum‘, ‘avg‘, ‘min‘, or ‘max‘.Function used to aggregate values for summarization (note: can be normalized with histnorm). The arguments to this function for histogram are the values of y if orientation is ‘v‘, otherwise the arguements are the values of x. The arguments to this function for density_heatmap and density_contour are the values of z.
Python Uniform Distribution
1 2 3 4 5 6 7 8 9 10 11 12
import numpy as np import pandas as pd import plotly.express as px
x = np.random.uniform(0.0, 5.0, 250) df = pd.DataFrame(data = x, columns = ['x']) # Convert x from array to dataframe fig = px.histogram(df, x = 'x',) fig.update_layout(title = 'Uniform Random Number Histogram', xaxis_title="Range", yaxis_title="Frequency",)
fig.show()
histnorm (str (default None)) – One of ‘percent‘, ‘probability‘, ‘density‘, or ‘probability density‘
If None, the output of histfunc is used as is.
If ‘probability‘, the output of histfunc for a given bin is divided by the sum of the output of histfunc for all bins.
If ‘percent‘, the output of histfunc for a given bin is divided by the sum of the output of histfunc for all bins and multiplied by 100.
If ‘density‘, the output of histfunc for a given bin is divided by the size of the bin.
If ‘probability density‘, the output of histfunc for a given bin is normalized such that it corresponds to the probability that a random event whose distribution is described by the output of histfunc will fall into that bin.
Python Uniform Distribution
1 2 3 4 5 6 7 8 9 10 11 12
import numpy as np import pandas as pd import plotly.express as px
x = np.random.uniform(0.0, 5.0, 250) df = pd.DataFrame(data = x, columns = ['x']) # Convert x from array to dataframe fig = px.histogram(df, x = 'x', histnorm = 'probability') fig.update_layout(title = 'Uniform Random Number Histogram', xaxis_title="Range", yaxis_title="Frequency")
fig.show()
Python Normal Distribution
1 2 3 4 5 6 7 8 9 10 11 12 13
import numpy as np import pandas as pd import plotly.express as px
In a box plot, rows of data_frame are grouped together into a box-and-whisker mark to visualize their distribution.
Each box spans from quartile 1 (Q1) to quartile 3 (Q3). The second quartile (Q2) is marked by a line inside the box. By default, the whiskers correspond to the box’ edges +/- 1.5 times the interquartile range (IQR: Q3-Q1), see “points” for other options.
Python
1 2 3 4 5
import plotly.express as px
df = px.data.tips() fig = px.box(df, x = "day", y = "total_bill", color = "smoker", notched = True) fig.show()
notched (boolean (default False)) – If True, boxes are drawn with notches.
In a violin plot, rows of data_frame are grouped together into a curved mark to visualize their distribution.
Python
1 2 3 4 5
mport plotly.express as px
df = px.data.tips() fig = px.violin(df, y = "tip", x = "smoker", color = "sex", box = True, points = "all", hover_data = df.columns) fig.show()
box (boolean (default False)) – If True, boxes are drawn inside the violins.
points (str or boolean (default ‘outliers‘)) – One of ‘outliers‘, ‘suspectedoutliers‘, ‘all‘, or False.
If ‘outliers‘, only the sample points lying outside the whiskers are shown.
If ‘suspectedoutliers‘, all outlier points are shown and those less than 4 * Q1 - 3 * Q3 or greater than 4 * Q3 - 3 * Q1 are highlighted with the marker’s ‘outliercolor‘. If ‘outliers‘, only the sample points lying outside the whiskers are shown.
If ‘all‘, all sample points are shown.
If False, no sample points are shown and the whiskers extend to the full range of the sample.
In a ternary scatter plot, each row of data_frame is represented by a symbol mark in ternary coordinates.
Python
1 2 3 4
import plotly.express as px
df = px.data.election() df.head()
Python
1 2 3 4 5 6
import plotly.express as px
df = px.data.election() fig = px.scatter_ternary(df, a = "Joly", b = "Coderre", c = "Bergeron", color = "winner", size = "total", hover_name = "district", size_max = 15, color_discrete_map = {"Joly": "blue", "Bergeron": "green", "Coderre":"red"} ) fig.show()
a (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks along the a axis in ternary coordinates.
color_discrete_map (dict with str keys and str values (default {})) – String values should define valid CSS-colors Used to override color_discrete_sequence to assign a specific colors to marks corresponding with specific values. Keys in color_discrete_map should be values in the column denoted by color.
In a ternary line plot, each row of data_frame is represented as vertex of a polyline mark in ternary coordinates.
Python
1 2 3 4
import plotly.express as px
df = px.data.election() df.head()
Python
1 2 3 4 5
import plotly.express as px
df = px.data.election() fig = px.line_ternary(df, a = "Joly", b = "Coderre", c = "Bergeron", color = "winner", line_dash = "winner") fig.show()
line_dash (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign dash-patterns to lines.
In a 3D scatter plot, each row of data_frame is represented by a symbol mark in 3D space.
Python
1 2 3 4
import plotly.express as px
df = px.data.election() df.head()
Python
1 2 3 4 5 6
import plotly.express as px
df = px.data.election() fig = px.scatter_3d(df, x = "Joly", y = "Coderre", z = "Bergeron", color = "winner", size = "total", hover_name = "district", symbol = "result", color_discrete_map = {"Joly": "blue", "Bergeron": "green", "Coderre":"red"}) fig.show()
symbol (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign symbols to marks.
In a polar scatter plot, each row of data_frame is represented by a symbol mark in polar coordinates.
Python
1 2 3 4
import plotly.express as px
df = px.data.wind() df.head()
Python
1 2 3 4 5 6
import plotly.express as px
df = px.data.wind() fig = px.scatter_polar(df, r = "frequency", theta = "direction", color = "strength", symbol = "strength", color_discrete_sequence = px.colors.sequential.Plasma_r) fig.show()
r (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks along the radial axis in polar coordinates.
theta (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks along the angular axis in polar coordinates.
To plot on Mapbox maps with Plotly you may need a Mapbox account and a public Mapbox Access Token. See Mapbox Map Layers documentation for more information.
After register an account for Mapbox. Click New Style.
Click Share on the top and right.
Cope your Access Token
Then open your shell. Make sure you are in your .py file working directory.
CLI
1 2 3
pwd
ls -l
Then replace your Acess Token to the following command.
lat (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks according to latitude on a map.
lon (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks according to longitude on a map.
https://plot.ly/python/linear-fits/ Add linear Ordinary Least Squares (OLS) regression trendlines or non-linear Locally Weighted Scatterplot Smoothing (LOEWSS) trendlines to scatterplots in Python.
Plotly Express allows you to add Ordinary Least Squares regression trendline to scatterplots with the trendline argument. In order to do so, you will need to install statsmodels and its dependencies. Hovering over the trendline will show the equation of the line and its R-squared value.
Python
1 2 3 4
import plotly.express as px
df = px.data.tips() df.head()
Python
1 2 3 4 5 6 7 8 9 10 11
import plotly.express as px
import plotly.express as px
df = px.data.tips() fig = px.scatter(df, x = "total_bill", y = "tip", trendline = "ols") fig.update_layout(title = "Total Bill vs. Tip", xaxis_title = "Total Bill", yaxis_title = "Tip")
fig.show()
trendline (str) – One of ‘ols‘ or ‘lowess‘.
If ‘ols‘, an Ordinary Least Squares regression line will be drawn for each discrete-color/symbol group.
If ‘lowess’, a Locally Weighted Scatterplot Smoothing line will be drawn for each discrete-color/symbol group.
Fitting multiple lines and retrieving the model parameters