#install.packages('ggplot2')
#library(ggplot2)
GOALS: Students should be able to use ggplot2 to generate publication quality graphics and understand and use the basics of the grammar of graphics.
##DataViz
ggplot2
is built on the grammar-of-graphics:
ggplot2
is thinking about a figure in layers – think of ArcGIS or programs like Photoshopgeom_point()
, geom bar()
, geom density()
, geom line()
, geom area()
#gapminder <- read.csv("https://goo.gl/BtBnPg", header = T)
gapminder <- read.csv('gapminder-FiveYearData.csv', header=T)
Let’s start off with an example:
library(ggplot2)
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point()
NOTE:
First we call the ggplot
function, letting R know that we’re creating a new plot
any arguments we provide the ggplot
function are considered global options: they apply to all layers on the plot.
We passed two arguments to ggplot
:
data
we want to show e.g. gapminder dataan aes
function - which tells ggplot how variables in the data map to aesthetic properties in the x
& y
locations e.g. gdpPercap column on the x
and lifeExp column on the y
axies
notice we didn’t have to define the data and column,ggplot
is smart enough to look in the data for the columns.
Alone the ggplot
call isn’t enough to render the plot.
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp))
## If run, would produce a blank plot or error.
geom
layer.geom_point
to create a scatter plot to represent relationship between x/y.ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point()
ggplot
to visualize data as a line plot.ggplot(data = gapminder, aes(x=year, y=lifeExp, by=country, color=continent)) +
geom_line()
geom_line
instead of geom_point
for the geom layeradded a by aesthetic by=country
to get a line per country and color by continent
All there is to do is add another layer + geom_point()
to the plot:
ggplot(data = gapminder, aes(x=year, y=lifeExp, by=country, color=continent)) +
geom_line() + geom_point()
important to note this is layered: so points have been drawn on top of the previous lines layer.
As an example of this
ggplot(data = gapminder, aes(x=year, y=lifeExp, by=country)) +
geom_line(aes(color=continent)) + geom_point()
in the above the aesthetic mapping of color has been moved from the global plot options in ggplot
to the geom_line
layer so it no longer applies to the points
this shows the points are drawn on top of the lines.
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp, color=continent)) +
geom_point()
x
axis using the scale functionsWe’ll also use the alpha
function, which is helpful when you have a large amount of data which is v. clustered
alpha
value are any numbers from 0 (transparent) to 1 (opaque). default is usually 1.
ggplot(data = gapminder, aes(x = lifeExp, y = gdpPercap)) +
geom_point(alpha=0.5) + scale_y_log10()
log10
function applied a transformation to the values of the gdpPercap column before rendering them on the plot
This makes it easier to visualize the spread of data on the x-axis.
We can fit a simple relationship to the data by adding another layer, geom_smooth
:
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point() + scale_x_log10() + geom_smooth(method="lm")
make the line thicker by setting the size aesthetic in the geom_smooth
layer:
you can also assign plots to varialbes using the <-
operator
# example of assigning a plot to variable pwd
pwd <- ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point() + scale_x_log10() + geom_smooth(method="lm", size=1.5)
pwd
geom_smooth
.aes
function to define a mapping between data variables and their visual representation.starts.with <- substr(gapminder$country, start = 1, stop = 1)
az.countries <- gapminder[starts.with %in% c("A", "Z"), ]
Talk thru code: * We’ll start by subsetting the data using the substr
function topull out a part of a character string; * in this case, the letters that occur in positions start
through stop
, inclusive, of the gapminder$country
vector. * The operator %in%
allows us to make multiple comparisons rather than write out long subsetting conditions (in this case, starts.with %in% c("A", "Z")
is equivalent to starts.with == "A" | starts.with == "Z"
)
ggplot(data = az.countries, aes(x = year, y = lifeExp, color=continent)) +
geom_line() + facet_wrap( ~ country)
facet_wrap
layer took a “formula” as its argument, denoted by the tilde (~).~
tells R to draw a panel for each unique value in the country column of the gapminder dataset.Now lets clean up this figure for publication.
X-axis is too cluttered, y-axis should read “Life Expectancy” instead of column name.
labs
function.aes
specification.color = "Continent"
, while the title of a fill legend would be set using ’fill = “MyTitle”.ggplot(data = az.countries, aes(x = year, y = lifeExp, color=continent)) +
geom_line() + facet_wrap( ~ country) +
labs(
x = "Year", # x axis title
y = "Life expectancy", # y axis title
title = "Figure 1", # main title of figure
color = "Continent" # title of legend
) +
theme(axis.text.x=element_blank(), axis.ticks.x=element_blank())
ggsave('~/path/to/figure/filename.png')
ggsave(filename_to_save, file = "~/path/to/figure/filename.png") # filename, path to save location
ggsave(file = "/path/to/figure/filename.png", width = 6,
height =4) # Plot size in units ("in", "cm", or "mm"). If not supplied, uses the size of current graphics device.
# file can be either be a device function (e.g. png), or one of "eps", "ps", "tex" (pictex), "pdf", "jpeg", "tiff", "png", "bmp", "svg" or "wmf" (windows only).
ggsave(file = "/path/to/figure/filename.eps")
ggsave(file = "/path/to/figure/filename.jpg")
ggsave(file = "/path/to/figure/filename.pdf")
This is just a taste of what you can do with ggplot2
. RStudio provides a really useful cheat sheet of the different layers available, and more extensive documentation is available on the ggplot2 website. Finally, if you have no idea how to change something, a quick Google search will usually send you to a relevant question and answer on Stack Overflow with reusable code to modify!
ggplot save reference http://ggplot2.tidyverse.org/reference/ggsave.html
ggplot cheat sheet https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
ggplot site http://ggplot2.org/