R ggplot2
Be Awesome in ggplot2: A Practical Guide to be Highly Effective - R software and data visualization
如何在Python里用ggplot2绘图
ggplot function reference
Introduction
ggplot()
initializes a ggplot object. It can be used to declare the input data frame for a graphic and to specify the set of plot aesthetics intended to be common throughout all subsequent layers unless specifically overridden.
ggplot
is a package for R. In Python, we will use plotnine
, which is extremely similar to ggplot
.
Installation & Loading
For R:
- Download package
ggplot
that integrated in packagetidyverse
. - Then load it.
1 | # Installation |
For Python:
- Download Anaconda Environment, otherwise you will not have package
pandas
andnumpy
. - Download package
plotnine
throughpip
. If you never heardpip
, click here. - Then load it.
1 | pip install plotnine |
1 | # Loading |
Understand ggplot
To understand the logic of ggplot, you’d better learn what is the principles of graphics.
- Data: Data must be
data.frame
. - Aesthetics: Aesthetics is used to indicate x and y variables. It can also be used to control the color, the size or the shape of points, the height of bars, or etc.
- Geometric Objects: A layer combines data, aesthetic mapping, a geom (geometric object), a stat (statistical transformation), and a position adjustment. Typically, you will create layers using a
geom_
function, overriding the default position and stat if needed.
1 | # Define data frame |
1 | import pandas as pd |
Observe the difference of two part of code from R and Python. You must notice that Python need to create a dictionary
first then convert it to data.frame
. Because Python do not have data type named data.frame
. You have to import package pandas
then load data.frame
.
Let’s put the main part of R code and Python code together, and find out the difference in Code.
1 | # Draw two points |
1 | # Draw two points |
In Python:
- You must add an extra
()
to cover the wholeggplot
expression. - You can declare
data = data frame
in R; but you cannot declare it in Python. - You must use
''
or""
to wrap your parameters in theggplot
expression. But NOT fordata frame
. - You can put the
+
in the end or the beginning of each line.
In R:
- You can omit
data =
in the first line. - You must put the
+
in the end of each line.
This time, we try to connect the two point. That is the simplest function - directly proportional function. And it is also the simplest plot - line chart.
1 | # Connect two points |
1 | # Connect two points |
Explore more layers
Now we have understood the first three layers. Then let’s try to explore the rest four layers. If you cannot understand know, don’t worry. Just remember the concepts of layers is here.
See more detailed definition, click ggplot function reference.
From bottom to top:
- Data: Data must be
data.frame
. - Aesthetics: Aesthetics is used to indicate x and y variables. It can also be used to control the color, the size or the shape of points, the height of bars, or etc.
- Geometric Objects: A layer combines data, aesthetic mapping, a geom (geometric object), a stat (statistical transformation), and a position adjustment. Typically, you will create layers using a
geom_
function, overriding the default position and stat if needed. - Facets: Facetting generates small multiples, each displaying a different subset of the data. Facets are an alternative to aesthetics for displaying additional discrete variables.
- Statistical Transformations: A handful of layers are more easily specified with a
stat_
function, drawing attention to the statistical transformation rather than the visual appearance. The computed variables can be mapped usingstat()
. - Coordinates: The coordinate system determines how the
x
andy
aesthetics combine to position elements in the plot. The default coordinate system is Cartesian (coord_cartesian()
), which can be tweaked withcoord_map()
,coord_fixed()
,coord_flip()
, andcoord_trans()
, or completely replaced withcoord_polar()
. - Themes: Themes control the display of all non-data elements of the plot. You can override all settings with a complete theme like
theme_bw()
, or choose to tweak individual settings by usingtheme()
and theelement_
functions. Usetheme_set()
to modify the active theme, affecting all future plots.