Python Plotly

Posted on 2020-02-22 Edited on 2022-02-15 In Tech , Python , Visualization Symbols count in article: 39k Reading time ≈ 35 mins.

Python Tutorial

Getting Started with Plotly in Python
Plotly Express in Python
Plotly Python Open Source Graphing Library Basic Charts

Overview

Getting Started with Plotly in Python

The plotly Python library (plotly.py) is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases.

Installation & Loading

Download Anaconda Environment, otherwise you will not have package pandas and numpy.
Download package plotly through conda.

CLI

1	conda install -c plotly "plotly>=4.5.1"

JupyterLab Support (Python 3.5+)

For use in JupyterLab, install the jupyterlab and ipywidgets packages using conda:

CLI

1
2
3

conda update --all

conda install "ipywidgets>=7.5"

Then run the following commands to install the required JupyterLab extensions (note that this will require node to be installed):

Shell

1	node --version

For Mac OS only:
If it shows command not found: node, you should install node first:

CLI

1 2	conda install -c conda-forge nodejs conda install -c conda-forge yarn

For Mac OS only:
Then run the following commands to install the required JupyterLab extensions (note that this will require node to be installed):

CLI

# (OS X/Linux)
export NODE_OPTIONS=--max-old-space-size=4096

# Jupyter widgets extension
jupyter labextension install @jupyter-widgets/jupyterlab-manager --no-build

# jupyterlab renderer support
jupyter labextension install jupyterlab-plotly --no-build

# FigureWidget support
jupyter labextension install plotlywidget --no-build

# Build jupyterlab (must be done to activate extensions since --no-build is used above)
jupyter lab build

# Unset NODE_OPTIONS environment variable
# (OS X/Linux)
unset NODE_OPTIONS

After you installed all extensions, list the extensions for last check.

CLI

1	jupyter labextension list

Run Jupyter lab on the directory lower than your data source directory.

CLI

1	jupyter lab

Python

1
2
3

import plotly.graph_objects as go
fig = go.Figure(data=go.Bar(y=[2, 3, 1]))
fig.show()

See Displaying Figures in Python for more information on the renderers framework, and see Plotly FigureWidget Overview for more information on using FigureWidget.

Static Image Export Support

plotly.py supports static image export using the to_image and write_image functions in the plotly.io package. This functionality requires the installation of the plotly orca command line utility and the psutil and requests Python packages.

Note: The requests library is used to communicate between the Python process and a local orca server process, it is not used to communicate with any external services.

These dependencies can all be installed using conda:

CLI

1	conda install -c plotly plotly-orca psutil requests

These packages contain everything you need to save figures as static images.

Python

1
2
3

import plotly.graph_objects as go
fig = go.FigureWidget(data=go.Bar(y=[2, 3, 1]))
fig.write_image('figure.png')

See Static Image Export in Python for more information on static image export.

Extended Geo Support

Some plotly.py features rely on fairly large geographic shape files. The county choropleth figure factory is one such example. These shape files are distributed as a separate plotly-geo package. This package can be installed using conda.

CLI

1	conda install -c plotly plotly-geo

See USA County Choropleth Maps in Python for more information on the county choropleth figure factory.

Chart Studio Support

The chart-studio package can be used to upload plotly figures to Plotly’s Chart Studio Cloud or On-Prem services. This package can be installed using conda.

CLI

1	conda install -c plotly chart-studio

Note: This package is optional, and if it is not installed it is not possible for figures to be uploaded to the Chart Studio cloud service.

Plotly Express

Introduction

Plotly Express is a terse, consistent, high-level API for rapid data exploration and figure generation.

Let’s try the same experiments from R ggplot.

Understanding `plotly` through `ggplot`

plotly has the similar concepts - Layer as ggplot.

Like this:

Data: Data must be data.frame.
Aesthetics: Aesthetics is used to indicate x and y variables. It can also be used to control the color, the size or the shape of points, the height of bars, or etc.
Geometric Objects: A layer combines data, aesthetic mapping, a geom (geometric object), a stat (statistical transformation), and a position adjustment. Typically, you will create layers using a geom_ function, overriding the default position and stat if needed.

Basically:
Data: Assume your data are two points, for example, point $(0, 0)$ and $(1, 1)$. You need to tell the ggplot your data source.

Aesthetics: Then you need to define the x variables and y variables.

Geometric Objects: Tell ggplot what kind of plot - the object, you need. Let’s draw it on your paper.

# Define data frame
two_points <- data.frame(x_value = c(0, 1), y_value = c(0, 1)) # Create your data two points (0, 0) and (1, 1)

# Draw two points
ggplot(data = two_points, # Data: tell ggplot the data source.
       aes(x = x_value, y = y_value)) + # Aesthetics: Define x variables and y variables.
  geom_point() # Geometric Objects: Tell ggplot what kind of plot you need.

Python

import pandas as pd
import numpy as py
import plotly.express as px

# Define data as dictionary
d = {'x_value': [0, 1], 'y_value': [0, 1]}

# Convert dict to data frame
two_points = pd.DataFrame(data = d)

# Draw two points
px.scatter( # Geometric Objects in ggplot: Tell ggplot what kind of plot you need.
    two_points, # Data in ggplot: tell ggplot the data source.
    x = 'x_value', y = 'y_value' # Aesthetics in ggplot: Define x variables and y variables.
)

In comparison with the R code, we can see that they have the same concepts. Additionally, plotly is based on matplolib. Therefore, we are used to using a variable fig to inherit our plot value. And using show() function to show the picture:

Python

1
2
3

# Draw two points
fig = px.scatter(two_points, x = 'x_value', y = 'y_value')
fig.show()

In order connecting the two points, we should add a line. In R, we just draw a line from a point to the other one:

# Connect two points
ggplot(data = two_points, # Data: tell ggplot the data source.
       aes(x = x_value, y = y_value)) + # Aesthetics: Define x variables and y variables.
  geom_point() + geom_line() # Geometric Objects: This time we connect the two points.

However, in plotly we cannot do it since it is actually not a type point; instead, it is a type scatter. So the only thing we need to do is to change the chart type from scatter to line:

Python

1
2
3

# Connect two points
fig = px.line(two_points, x = 'x_value', y = 'y_value') # Change the chart type
fig.show()

Data format and preparation

The data set mtcars is used in the examples below:

Data profile:

mtcars : Motor Trend Car Road Tests.
Description: The data comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973 - 74 models).
Format: A data frame with 32 observations on 3 variables.
- [, 1] mpg Miles/(US) gallon
- [, 2] cyl Number of cylinders
- [, 3] wt Weight (lb/1000)

# Load the data
data(mtcars)
df <- mtcars[, c("mpg", "cyl", "wt")] # Extract all of the rows; extract columns: "mpg", "cyl", "wt".

# Convert cyl to a factor variable
df$cyl <- as.factor(df$cyl)
head(df) # show the first 6 rows of data

##                    mpg cyl    wt

## Mazda RX4         21.0   6 2.620
## Mazda RX4 Wag     21.0   6 2.875

## Datsun 710        22.8   4 2.320
## Hornet 4 Drive    21.4   6 3.215

## Hornet Sportabout 18.7   8 3.440
## Valiant           18.1   6 3.460

Python

import pandas as pd
import numpy as np
from plotnine import *
from plotnine.data import mtcars

df = mtcars[["name", "mpg", "cyl", "wt"]]
df.head(6) # show the first 6 rows of data

#               name   mpg  cyl     wt
#          Mazda RX4  21.0    6  2.620
#      Mazda RX4 Wag  21.0    6  2.875
#         Datsun 710  22.8    4  2.320
#     Hornet 4 Drive  21.4    6  3.215
#  Hornet Sportabout  18.7    8  3.440
#            Valiant  18.1    6  3.460

Scatter plots - Compare with ggplot2

The R code below creates basic scatter plots using the argument geom = “point”. It’s also possible to combine different geoms (e.g.: geom = c(“point”, “smooth”)).

R Scatter

1 2	# Basic scatter plot qplot(x = mpg, y = wt, data = df, geom = "point")

1
2
3

# Scatter plot with smoothed line
qplot(mpg, wt, data = df, 
      geom = c("point", "smooth"))

Python Scatter

# Basic scatter plot
import plotly.express as px

fig = px.scatter(df, x = 'mpg', y = 'wt')
fig.show()

Python Scatter

import plotly.express as px

fig = px.scatter(df, x = 'mpg', y = 'wt', trendline = 'lowess')
fig.show()

The following R code will change the color and the shape of points by groups. The column cyl will be used as grouping variable. In other words, the color and the shape of points will be changed by the levels of cyl.

R Scatter

1	qplot(mpg, wt, data = df, color = cyl, shape = cyl)

Python Scatter

import plotly.express as px

fig = px.scatter(df, x = 'mpg', y = 'wt', color = 'cyl')
fig.show()

Box plot, violin plot and dot plot - Compare with ggplot2

The R code below generates some data containing the weights by sex (M for male; F for female):

# Define Data Frame
wdata = data.frame(
  sex = factor(rep(c("F", "M"), each=200)),
  weight = c(rnorm(200, 55), rnorm(200, 58)))

head(wdata)

#    sex weight
# 1   F   53.8
# 2   F   55.3
# 3   F   56.1
# 4   F   52.7
# 5   F   55.4
# 6   F   55.5

Python

import pandas as pd
import numpy as np
from plotnine import *

# Define Wight
np.random.seed(1234)
weight1 = np.random.normal(55, size = 200)
weight2 = np.random.normal(58, size = 200)
weight = np.append(weight1, weight2)

# Define sex
sex = ['F'] * 200 + ['M'] * 200

# Define Data Frame
wdata = pd.DataFrame(list(zip(sex, weight)), columns = ["Sex", "Weight"]) # zip() for zipping two lists

wdata.head(6)
"""
  Sex     Weight
0   F  55.471435
1   F  53.809024
2   F  56.432707
3   F  54.687348
4   F  54.279411
5   F  55.887163
"""

R Box plot

1
2
3

# Basic box plot from data frame
qplot(sex, weight, data = wdata, 
      geom = "boxplot", fill = sex)

R Violin plot

1 2	# Violin plot qplot(sex, weight, data = wdata, geom = "violin")

R Dot plot

1
2
3

# Dot plot
qplot(sex, weight, data = wdata, geom = "dotplot",
      stackdir = "center", binaxis = "y", dotsize = 0.5)

Python Box plot

1 2	fig = px.box(wdata, x = 'Sex', y = 'Weight', color = 'Sex') fig.show()

Python Violin plot

1 2	fig = px.violin(wdata, x = 'Sex', y = 'Weight', color = 'Sex') fig.show()

Python Violin plot with Points

1 2	fig = px.violin(wdata, x = 'Sex', y = 'Weight', color = 'Sex', points = 'all') fig.show()

Histogram and density plots - Compare with ggplot2

The histogram and density plots are used to display the distribution of data.

R Histogram

# Histogram  plot
# Change histogram fill color by group (sex)
qplot(weight, data = wdata, geom = "histogram",
      fill = sex)

Density Plot

# Density plot
# Change density plot line color by group (sex)
# change line type
qplot(weight, data = wdata, geom = "density",
      color = sex, linetype = sex)

Python Histogram

1 2	fig = px.histogram(wdata, 'Weight', color = 'Sex') fig.show()

Python Density Plot

import plotly.figure_factory as ff

w1 = wdata[wdata['Sex'] == 'M'].iloc[:, 1] # Extract rows 'Sex' == 'M'
w2 = wdata[wdata['Sex'] == 'F'].iloc[:, 1] # Extract rows 'Sex' == 'F'

hist_data = [w1, w2]
group_labels = ['M', 'F']

fig = ff.create_distplot(hist_data, group_labels, show_hist=False)
fig.show()

Scatter and Line plots

Refer to the main scatter and line plot page for full documentation.

plotly.express.scatter

In a scatter plot, each row of data_frame is represented by a symbol mark in 2D space.

Python

import plotly.express as px

print(px.data.iris.__doc__)
px.data.iris().head()

Python

import plotly.express as px

df = px.data.iris()
fig = px.scatter(df, x = "sepal_width", y = "sepal_length")
fig.show()

Python

import plotly.express as px

df = px.data.iris()
fig = px.scatter(df, x = "sepal_width", y = "sepal_length", color = "species")
fig.show()

color (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign color to marks.

Python

import plotly.express as px

df = px.data.iris()
fig = px.scatter(df, x = "sepal_width", y = "sepal_length", color = "species", marginal_y = "rug", marginal_x = "histogram")
fig.show()

marginal_x (str) – One of 'rug', 'box', 'violin', or 'histogram`’. If set, a vertical subplot is drawn to the right of the main plot, visualizing the x-distribution.

Python

import plotly.express as px

df = px.data.iris()
fig = px.scatter(df, x = "sepal_width", y = "sepal_length", color = "species", marginal_y = "violin", marginal_x = "box", trendline = "ols")
fig.show()

trendline (str) – One of ‘ols‘ or ‘lowess‘. If ‘ols‘, an Ordinary Least Squares regression line will be drawn for each discrete-color/symbol group. If ‘lowess’, a Locally Weighted Scatterplot Smoothing line will be drawn for each discrete-color/symbol group.

Python

import plotly.express as px

df = px.data.iris()
df["e"] = df["sepal_width"]/100
fig = px.scatter(df, x = "sepal_width", y = "sepal_length", color = "species", error_x = "e", error_y = "e")
fig.show()

error_x (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to size x-axis error bars. If error_x_minus is None, error bars will be symmetrical, otherwise error_x is used for the positive direction only.

Python

import plotly.express as px

df = px.data.tips()
df.head()

Python

import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x = "total_bill", y = "tip", facet_row = "time", facet_col = "day", color = "smoker", trendline = "ols",
          category_orders = {"day": ["Thur", "Fri", "Sat", "Sun"], "time": ["Lunch", "Dinner"]})
fig.show()

facet_row (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign marks to facetted subplots in the vertical direction.
facet_col (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign marks to facetted subplots in the horizontal direction.
category_orders (dict with str keys and list of str values (default {})) – By default, in Python 3.6+, the order of categorical values in axes, legends and facets depends on the order in which these values are first encountered in data_frame (and no order is guaranteed by default in Python below 3.6). This parameter is used to force a specific ordering of values per column. The keys of this dict should correspond to column names, and the values should be lists of strings corresponding to the specific display order desired.

Python

import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x = "total_bill", y = "tip", color = "size", 
facet_col = "sex", color_continuous_scale = px.colors.sequential.Viridis, 
render_mode = "webgl")
fig.show()

render_mode (str) – One of ‘auto‘, ‘svg‘ or ‘webgl‘, default ‘auto‘ Controls the browser API used to draw marks. ‘svg’ is appropriate for figures of less than 1000 data points, and will allow for fully-vectorized output. ‘webgl‘ is likely necessary for acceptable performance above 1000 points but rasterizes part of the output. ‘auto‘ uses heuristics to choose the mode.

Python

import plotly.express as px

df = px.data.gapminder()
df.head()

Python

1	df[df["year"] == 2007]

Python

import plotly.express as px

df = px.data.gapminder()
fig = px.scatter(df.query("year == 2007"), x = "gdpPercap", y = "lifeExp", size = "pop", color = "continent",
           hover_name = "country", log_x = True, size_max = 60)
fig.show()

Package Pandas:

DataFrame.query(self, expr, inplace=False, **kwargs)

Package plotly.express:

hover_name (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like appear in bold in the hover tooltip.
log_x (boolean (default False)) – If True, the x-axis is log-scaled in cartesian coordinates.

Python

import plotly.express as px

df = px.data.gapminder()
fig = px.scatter(df, x = "gdpPercap", y = "lifeExp", animation_frame = "year", animation_group = "country",
           size = "pop", color = "continent", hover_name = "country", facet_col = "continent",
           log_x = True, size_max = 45, range_x = [100,100000], range_y = [25,90])
fig.show()

animation_frame (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign marks to animation frames.
animation_group (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to provide object-constancy across animation frames: rows with matching animation_group‘s will be treated as if they describe the same object in each frame.
range_x (list of two numbers) – If provided, overrides auto-scaling on the x-axis in cartesian coordinates.

plotly.express.line

In a 2D line plot, each row of data_frame is represented as vertex of a polyline mark in 2D space.

Python

import plotly.express as px

df = px.data.gapminder()
fig = px.line(df, x = "year", y = "lifeExp", color = "continent", line_group = "country", hover_name = "country",
        line_shape = "spline", render_mode = "svg")
fig.show()

line_group (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to group rows of data_frame into lines.
line_shape (str (default ‘linear‘)) – One of ‘linear‘ or ‘spline‘.

plotly.express.area

In a stacked area plot, each row of data_frame is represented as vertex of a polyline mark in 2D space. The area between successive polylines is filled.

Python

import plotly.express as px

df = px.data.gapminder()
fig = px.area(df, x = "year", y = "pop", color = "continent", line_group = "country")
fig.show()

plotly.express.scatter_matrix

In a scatter plot matrix (or SPLOM), each row of data_frame is represented by a multiple symbol marks, one in each cell of a grid of 2D scatter plots, which plot each pair of dimensions against each other.

Python

import plotly.express as px

print(px.data.iris.__doc__)
px.data.iris().head()

Python

import plotly.express as px

df = px.data.iris()
fig = px.scatter_matrix(df)
fig.show()

Python

import plotly.express as px

df = px.data.gapminder()
fig = px.scatter(df.query("year == 2007"), x = "gdpPercap", y = "lifeExp", size = "pop", color = "continent",
           hover_name = "country", log_x = True, size_max = 60)
fig.show()

Python

import plotly.express as px

df = px.data.iris()
fig = px.scatter_matrix(df, dimensions = ["sepal_width", "sepal_length", "petal_width", "petal_length"], color = "species")
fig.show()

dimensions (list of str or int, or Series or array-like) – Either names of columns in data_frame, or pandas Series, or array_like objects Values from these columns are used for multidimensional visualization.

plotly.express.parallel_coordinates

In a parallel coordinates plot, each row of data_frame is represented by a polyline mark which traverses a set of parallel axes, one for each of the dimensions.

Python

import plotly.express as px

print(px.data.iris.__doc__)
px.data.iris().head()

Python

import plotly.express as px

df = px.data.iris()
fig = px.parallel_coordinates(df, color = "species_id", labels = {"species_id": "Species",
                  "sepal_width": "Sepal Width", "sepal_length": "Sepal Length",
                  "petal_width": "Petal Width", "petal_length": "Petal Length", },
                    color_continuous_scale = px.colors.diverging.Tealrose, color_continuous_midpoint = 2)
fig.show()

labels (dict with str keys and str values (default {})) – By default, column names are used in the figure for axis titles, legend entries and hovers. This parameter allows this to be overridden. The keys of this dict should correspond to column names, and the values should correspond to the desired label to be displayed.
color_continuous_scale (list of str) – Strings should define valid CSS-colors This list is used to build a continuous color scale when the column denoted by color contains numeric data. Various useful color scales are available in the plotly.express.colors submodules, specifically plotly.express.colors.sequential, plotly.express.colors.diverging and plotly.express.colors.cyclical.
color_continuous_midpoint (number (default None)) – If set, computes the bounds of the continuous color scale to have the desired midpoint. Setting this value is recommended when using plotly.express.colors.diverging color scales as the inputs to color_continuous_scale.

plotly.express.parallel_categories

In a parallel categories (or parallel sets) plot, each row of data_frame is grouped with other rows that share the same values of dimensions and then plotted as a polyline mark through a set of parallel axes, one for each of the dimensions.

Python

import plotly.express as px

df = px.data.tips()
df.head()

Python

import plotly.express as px

df = px.data.tips()
fig = px.parallel_categories(df, color = "size", color_continuous_scale = px.colors.sequential.Inferno)
fig.show()

Visualize Distributions

Refer to the main statistical graphs page for full documentation.

plotly.express.density_contour

In a density contour plot, rows of data_frame are grouped together into contour marks to visualize the 2D distribution of an aggregate function histfunc (e.g. the count or sum) of the value z.

Python

import plotly.express as px

df = px.data.iris()
df.head()

Python

import plotly.express as px

df = px.data.iris()
fig = px.density_contour(df, x = "sepal_width", y = "sepal_length")
fig.show()

Python

import plotly.express as px

df = px.data.iris()
fig = px.density_contour(df, x = "sepal_width", y = "sepal_length", color = "species", marginal_x = "rug", marginal_y = "histogram")
fig.show()

plotly.express.density_heatmap

In a density heatmap, rows of data_frame are grouped together into colored rectangular tiles to visualize the 2D distribution of an aggregate function histfunc (e.g. the count or sum) of the value z.

Python

import plotly.express as px

df = px.data.iris()
fig = px.density_heatmap(df, x = "sepal_width", y = "sepal_length", marginal_x = "rug", marginal_y = "histogram")
fig.show()

plotly.express.bar

In a bar plot, each row of data_frame is represented as a rectangular mark.

Python

import plotly.express as px

df = px.data.tips()
df.head()

Python

import plotly.express as px

df = px.data.tips()
fig = px.bar(df, x = "sex", y = "total_bill", color = "smoker", barmode = "group")
fig.show()

barmode (str (default ‘relative‘)) – One of ‘group‘, ‘overlay‘ or ‘relative‘
- In ‘relative‘ mode, bars are stacked above zero for positive values and below zero for negative values.
- In ‘overlay‘ mode, bars are drawn on top of one another.
- In ‘group‘ mode, bars are placed beside each other.

Python

import plotly.express as px

df = px.data.tips()
fig = px.bar(df, x = "sex", y = "total_bill", color = "smoker", barmode = "group", facet_row = "time", facet_col = "day",
       category_orders = {"day": ["Thur", "Fri", "Sat", "Sun"], "time": ["Lunch", "Dinner"]})
fig.show()

plotly.express.histogram

In a histogram, rows of data_frame are grouped together into a rectangular mark to visualize the 1D distribution of an aggregate function histfunc (e.g. the count or sum) of the value y (or x if orientation is ‘h‘).

Python

import plotly.express as px

df = px.data.tips()
fig = px.histogram(df, x = "total_bill", y = "tip", color = "sex", marginal = "rug", hover_data = df.columns)
fig.show()

hover_data (list of str or int, or Series or array-like) – Either names of columns in data_frame, or pandas Series, or array_like objects Values from these columns appear as extra data in the hover tooltip.

Python

import plotly.express as px

df = px.data.tips()
fig = px.histogram(df, x = "sex", y = "tip", histfunc = "avg", color = "smoker", barmode = "group",
             facet_row = "time", facet_col = "day", category_orders = {"day": ["Thur", "Fri", "Sat", "Sun"],
                                                                "time": ["Lunch", "Dinner"]})
fig.show()

histfunc (str (default ‘count‘)) – One of ‘count‘, ‘sum‘, ‘avg‘, ‘min‘, or ‘max‘.Function used to aggregate values for summarization (note: can be normalized with histnorm). The arguments to this function for histogram are the values of y if orientation is ‘v‘, otherwise the arguements are the values of x. The arguments to this function for density_heatmap and density_contour are the values of z.

Python Uniform Distribution

import numpy as np
import pandas as pd
import plotly.express as px

x = np.random.uniform(0.0, 5.0, 250)
df = pd.DataFrame(data = x, columns = ['x']) # Convert x from array to dataframe
fig = px.histogram(df, x = 'x',)
fig.update_layout(title = 'Uniform Random Number Histogram',
                  xaxis_title="Range",
                  yaxis_title="Frequency",)

fig.show()

histnorm (str (default None)) – One of ‘percent‘, ‘probability‘, ‘density‘, or ‘probability density‘
- If None, the output of histfunc is used as is.
- If ‘probability‘, the output of histfunc for a given bin is divided by the sum of the output of histfunc for all bins.
- If ‘percent‘, the output of histfunc for a given bin is divided by the sum of the output of histfunc for all bins and multiplied by 100.
- If ‘density‘, the output of histfunc for a given bin is divided by the size of the bin.
- If ‘probability density‘, the output of histfunc for a given bin is normalized such that it corresponds to the probability that a random event whose distribution is described by the output of histfunc will fall into that bin.

Python Uniform Distribution

import numpy as np
import pandas as pd
import plotly.express as px

x = np.random.uniform(0.0, 5.0, 250)
df = pd.DataFrame(data = x, columns = ['x']) # Convert x from array to dataframe
fig = px.histogram(df, x = 'x', histnorm = 'probability')
fig.update_layout(title = 'Uniform Random Number Histogram',
                  xaxis_title="Range",
                  yaxis_title="Frequency")

fig.show()

Python Normal Distribution

import numpy as np
import pandas as pd
import plotly.express as px

np.random.seed(100)
df = pd.DataFrame(data = np.random.normal(200, 25, size = 10000), columns = ['x']) # mu = 200, sigma = 25

fig = px.histogram(df, x = 'x', histnorm = 'probability')
fig.update_layout(title = 'Normal Distribution',
                 xaxis_title = 'Range',
                 yaxis_title = 'Relative Frequency')

fig.show()

Python Normal Distribution Density Curve

import plotly.figure_factory as ff
import numpy as np

np.random.seed(100)
df = pd.DataFrame(data = np.random.normal(200, 25, size = 10000), columns = ['x']) # mu = 200, sigma = 25

hist_data = [df['x']]
group_labels = ['distplot'] # name of the dataset

fig = ff.create_distplot(hist_data, group_labels, curve_type = 'normal') # override default 'kde'
fig.update_layout(title = 'Normal Distribution Cumulative Relative Frequency',
                 xaxis_title = 'Range',
                 yaxis_title = 'Relative Frequency')

fig.show()

plotly.express.strip

In a strip plot each row of data_frame is represented as a jittered mark within categories.

Python

import plotly.express as px

df = px.data.tips()
fig = px.strip(df, x = "total_bill", y = "time", orientation = "h", color = "smoker")
fig.show()

orientation (str (default ‘v‘)) – One of ‘h‘ for horizontal or ‘v‘ for vertical)

plotly.express.box

In a box plot, rows of data_frame are grouped together into a box-and-whisker mark to visualize their distribution.

Each box spans from quartile 1 (Q1) to quartile 3 (Q3). The second quartile (Q2) is marked by a line inside the box. By default, the whiskers correspond to the box’ edges +/- 1.5 times the interquartile range (IQR: Q3-Q1), see “points” for other options.

Python

import plotly.express as px

df = px.data.tips()
fig = px.box(df, x = "day", y = "total_bill", color = "smoker", notched = True)
fig.show()

notched (boolean (default False)) – If True, boxes are drawn with notches.

plotly.express.violin

In a violin plot, rows of data_frame are grouped together into a curved mark to visualize their distribution.

Python

mport plotly.express as px

df = px.data.tips()
fig = px.violin(df, y = "tip", x = "smoker", color = "sex", box = True, points = "all", hover_data = df.columns)
fig.show()

box (boolean (default False)) – If True, boxes are drawn inside the violins.
points (str or boolean (default ‘outliers‘)) – One of ‘outliers‘, ‘suspectedoutliers‘, ‘all‘, or False.
- If ‘outliers‘, only the sample points lying outside the whiskers are shown.
- If ‘suspectedoutliers‘, all outlier points are shown and those less than 4 * Q1 - 3 * Q3 or greater than 4 * Q3 - 3 * Q1 are highlighted with the marker’s ‘outliercolor‘. If ‘outliers‘, only the sample points lying outside the whiskers are shown.
- If ‘all‘, all sample points are shown.
- If False, no sample points are shown and the whiskers extend to the full range of the sample.

Distplots in Python

Python

import plotly.express as px
import plotly.figure_factory as ff

df = px.data.tips()

df1 = df[df['sex'] == 'Female'].iloc[:, 0]# Extract rows where 'sex' == 'Male' with column 'total_bill'
df2 = df[df['sex'] == 'Male'].iloc[:, 0] # Extract rows where 'sex' == 'Female' with column 'total_bill'

hist_data = [df1, df2]
group_labels = ['Male', 'Female']

fig = ff.create_distplot(hist_data, group_labels, show_hist = False)

fig.show()

Ternary Coordinates

plotly.express.scatter_ternary

In a ternary scatter plot, each row of data_frame is represented by a symbol mark in ternary coordinates.

Python

import plotly.express as px

df = px.data.election()
df.head()

Python

import plotly.express as px

df = px.data.election()
fig = px.scatter_ternary(df, a = "Joly", b = "Coderre", c = "Bergeron", color = "winner", size = "total", hover_name = "district",
                   size_max = 15, color_discrete_map = {"Joly": "blue", "Bergeron": "green", "Coderre":"red"} )
fig.show()

a (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks along the a axis in ternary coordinates.
color_discrete_map (dict with str keys and str values (default {})) – String values should define valid CSS-colors Used to override color_discrete_sequence to assign a specific colors to marks corresponding with specific values. Keys in color_discrete_map should be values in the column denoted by color.

plotly.express.line_ternary

In a ternary line plot, each row of data_frame is represented as vertex of a polyline mark in ternary coordinates.

Python

import plotly.express as px

df = px.data.election()
df.head()

Python

import plotly.express as px

df = px.data.election()
fig = px.line_ternary(df, a = "Joly", b = "Coderre", c = "Bergeron", color = "winner", line_dash = "winner")
fig.show()

line_dash (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign dash-patterns to lines.

Images

plotly.express.imshow

Display an image, i.e. data on a 2D regular raster.

Python

import plotly.express as px
import numpy as np

img_rgb = np.array([[[255, 0, 0], [0, 255, 0], [0, 0, 255]],
                    [[0, 255, 0], [0, 0, 255], [255, 0, 0]]
                   ], dtype = np.uint8)
fig = px.imshow(img_rgb)
fig.show()

3D Coordinates

plotly.express.scatter_3d

In a 3D scatter plot, each row of data_frame is represented by a symbol mark in 3D space.

Python

import plotly.express as px

df = px.data.election()
df.head()

Python

import plotly.express as px

df = px.data.election()
fig = px.scatter_3d(df, x = "Joly", y = "Coderre", z = "Bergeron", color = "winner", size = "total", hover_name = "district",
                  symbol = "result", color_discrete_map = {"Joly": "blue", "Bergeron": "green", "Coderre":"red"})
fig.show()

symbol (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign symbols to marks.

plotly.express.line_3d

In a 3D line plot, each row of data_frame is represented as vertex of a polyline mark in 3D space.

Python

import plotly.express as px

df = px.data.election()
fig = px.line_3d(df, x = "Joly", y = "Coderre", z = "Bergeron", color = "winner", line_dash = "winner")
fig.show()

Polar Coordinates

plotly.express.scatter_polar

In a polar scatter plot, each row of data_frame is represented by a symbol mark in polar coordinates.

Python

import plotly.express as px

df = px.data.wind()
df.head()

Python

import plotly.express as px

df = px.data.wind()
fig = px.scatter_polar(df, r = "frequency", theta = "direction", color = "strength", symbol = "strength",
            color_discrete_sequence = px.colors.sequential.Plasma_r)
fig.show()

r (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks along the radial axis in polar coordinates.
theta (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks along the angular axis in polar coordinates.

plotly.express.line_polar

In a polar line plot, each row of data_frame is represented as vertex of a polyline mark in polar coordinates.

Python

import plotly.express as px

df = px.data.wind()
df.head()

Python

import plotly.express as px

df = px.data.wind()
fig = px.line_polar(df, r = "frequency", theta = "direction", color = "strength", line_close = True,
            color_discrete_sequence = px.colors.sequential.Plasma_r)
fig.show()

line_close (boolean (default False)) – If True, an extra line segment is drawn between the first and last point.

plotly.express.bar_polar

In a polar bar plot, each row of data_frame is represented as a wedge mark in polar coordinates.

Python

import plotly.express as px

df = px.data.wind()
fig = px.bar_polar(df, r = "frequency", theta = "direction", color = "strength", template = "plotly_dark",
            color_discrete_sequence= px.colors.sequential.Plasma_r)
fig.show()

template (or dict or plotly.graph_objects.layout.Template instance) – The figure template name or definition.

Maps

Mapbox Access Token and Base Map Configuration

To plot on Mapbox maps with Plotly you may need a Mapbox account and a public Mapbox Access Token. See Mapbox Map Layers documentation for more information.

After register an account for Mapbox. Click New Style.

Click Share on the top and right.

Cope your Access Token

Then open your shell. Make sure you are in your .py file working directory.

CLI

1
2
3

pwd

ls -l

Then replace your Acess Token to the following command.

CLI

1 2	touch .mapbox_token echo "Your Access Token" > .mapbox_token

plotly.express.scatter_mapbox

In a Mapbox scatter plot, each row of data_frame is represented by a symbol mark on a Mapbox map.

Python

import plotly.express as px

df = px.data.carshare()
df.head()

Python

import plotly.express as px

px.set_mapbox_access_token(open(".mapbox_token").read())
df = px.data.carshare()
fig = px.scatter_mapbox(df, 
                        lat='centroid_lat', 
                        lon='centroid_lon', 
                        color='peak_hour', 
                        size='car_hours', 
                        color_continuous_scale=px.colors.cyclical.IceFire, 
                        size_max=15, 
                        zoom=10)
fig.show()
# fig.write_image('fig.svg', scale=2)
# ![](https://raw.githubusercontent.com/ZacksAmber/PicGo/master/img/20210402000731.svg)

lat (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks according to latitude on a map.
lon (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks according to longitude on a map.

plotly.express.line_mapbox

In a Mapbox line plot, each row of data_frame is represented as vertex of a polyline mark on a Mapbox map.

Python

import plotly.express as px

df = px.data.carshare()
df.head()

Python

import plotly.express as px

px.set_mapbox_access_token(open(".mapbox_token").read())
df = px.data.carshare()
fig = px.line_mapbox(df, lat = "centroid_lat", lon = "centroid_lon", color = "peak_hour")
fig.show()

plotly.express.scatter_geo

In a geographic scatter plot, each row of data_frame is represented by a symbol mark on a map.

Python

import plotly.express as px

df = px.data.gapminder()
df.head()

Python

import plotly.express as px

df = px.data.gapminder()
fig = px.scatter_geo(df, locations = "iso_alpha", color = "continent", hover_name = "country", size = "pop",
               animation_frame = "year", projection = "natural earth")
fig.show()

projection (str) – One of ‘equirectangular‘, ‘mercator‘, ‘orthographic‘, ‘natural earth‘, ‘kavrayskiy7‘, ‘miller‘, ‘robinson‘, ‘eckert4‘, ‘azimuthal equal area‘, ‘azimuthal equidistant‘, ‘conic equal area‘, ‘conic conformal‘, ‘conic equidistant‘, ‘gnomonic‘, ‘stereographic‘, ‘mollweide‘, ‘hammer‘, ‘transverse mercator‘, ‘albers usa‘, ‘winkel tripel‘, ‘aitoff‘, or ‘sinusoidal‘Default depends on scope.

plotly.express.line_geo

In a geographic line plot, each row of data_frame is represented as vertex of a polyline mark on a map.

Python

import plotly.express as px

df = px.data.gapminder()
df.head()

Python

1	df[df["year"] == 2007]

Python

import plotly.express as px

df = px.data.gapminder()
fig = px.line_geo(df.query("year == 2007"), locations = "iso_alpha", color = "continent", projection = "orthographic")
fig.show()

plotly.express.choropleth

In a choropleth map, each row of data_frame is represented by a colored region mark on a map.

Python

import plotly.express as px

df = px.data.gapminder()
fig = px.choropleth(df, locations = "iso_alpha", color = "lifeExp", hover_name = "country", animation_frame = "year", range_color = [20,80])
fig.show()

range_color (list of two numbers) – If provided, overrides auto-scaling on the continuous color scale.

Reference Documentation

Statistical

Linear and Non-Linear Trendlines in Python

https://plot.ly/python/linear-fits/
Add linear Ordinary Least Squares (OLS) regression trendlines or non-linear Locally Weighted Scatterplot Smoothing (LOEWSS) trendlines to scatterplots in Python.

Linear fit trendlines with Plotly Express

https://plot.ly/python-api-reference/generated/plotly.express.scatter.html#plotly.express.scatter

Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on “tidy” data and produces easy-to-style figures.

Plotly Express allows you to add Ordinary Least Squares regression trendline to scatterplots with the trendline argument. In order to do so, you will need to install statsmodels and its dependencies. Hovering over the trendline will show the equation of the line and its R-squared value.

Python

import plotly.express as px

df = px.data.tips()
df.head()

Python

import plotly.express as px

import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x = "total_bill", y = "tip", trendline = "ols")
fig.update_layout(title = "Total Bill vs. Tip",
                 xaxis_title = "Total Bill",
                 yaxis_title = "Tip")

fig.show()

trendline (str) – One of ‘ols‘ or ‘lowess‘.
- If ‘ols‘, an Ordinary Least Squares regression line will be drawn for each discrete-color/symbol group.
- If ‘lowess’, a Locally Weighted Scatterplot Smoothing line will be drawn for each discrete-color/symbol group.

Fitting multiple lines and retrieving the model parameters

https://plot.ly/python-api-reference/generated/plotly.express.scatter.html#plotly.express.scatter

Plotly Express will fit a trendline per trace, and allows you to access the underlying model parameters for all the models.

Python

import plotly.express as px

df = px.data.tips()
df.head()

Python

import plotly.express as px

df = px.data.tips()

fig = px.scatter(df, x = "total_bill", y = "tip", facet_col = "smoker", color = "sex", trendline = "ols")
fig.update_layout(title = "Total Bill vs. Tip",
                 xaxis_title = "Total Bill",
                 yaxis_title = "Tip")
fig.update_xaxes(title_text="Total Bill", row = 1, col = 2) # Update Row 1 Col 2 X axis title
fig.show()

results = px.get_trendline_results(fig)
print(results)

results.query("sex == 'Male' and smoker == 'Yes'").px_fit_results.iloc[0].summary()

Non-Linear Trendlines

Plotly Express also supports non-linear LOWESS trendlines.

Python

import plotly.express as px

df = px.data.gapminder()
df.head()

Python

import plotly.express as px

df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x = "gdpPercap", y = "lifeExp", color = "continent", trendline = "lowess")
fig.show()