ggplot
The step-by-step process of making a map showing some information is the following:
ggplot
, using the fill
aesthetic to show your variable of interest. All ggplot
’s options, like color palettes, faceting (to create many maps at once) or adding more things to the map, are available to you.You can get various maps from the maps
and mapdata
packages, and then use the tidyverse function map_data()
to transform the maps into ggplot
-ready dataframes. Available maps:
Usage | Map | Package |
---|---|---|
“county” | US counties | maps |
“china” | China provincial boundaries | mapdata |
“france” | France NUTS III | maps |
“italy” | Italy NUTS III | maps |
“japan” | Japan prefecture boundaries | mapdata |
“legacy_world” | World in 1990, Atlantic centered | maps |
“legacy_world2” | World in 1990, Pacific centered | maps |
“nz” | New Zealand | maps |
“nzHires” | New Zealand, high resolution | mapdata |
“state” | US state boundaries | maps |
“usa” | US coast | maps |
“world” | World, present, Atlantic centered | maps |
“world2” | World, present, Pacific centered | maps |
“worldHires” | World, present, high resolution, Atlantic centered | mapdata |
“world2Hires” | World, present, high resolution, Pacific centered | mapdata |
“worldLores” | World, present, low resolution, Atlantic centered | mapdata |
“world2Lores” | World, present, low resolution, Pacific centered | mapdata |
To plot the map in ggplot
we use the geom_polygon
(except for China, see example below), using the variables aes(x = long, y = lat, group = group)
. The group
variable is needed to connect the points properly. To add information to the map we will use the fill
variable. The color
variable can be used for the boundaries. Finally, we use coord_fixed(1.3)
to keep the proportions fixed (you can play with the 1.3 number)
library(maps)
library(mapdata)
library(tidyverse)
library(ggrepel)
ggplot() +
geom_polygon(data = map_data("world"),
mapping = aes(x = long, y = lat, group = group),
fill = "dark gray", color = "white") +
coord_fixed(1.3) +
theme_void()
ggplot() +
geom_polygon(data = map_data("france"),
mapping = aes(x = long, y = lat, group = group),
fill = "dark gray", color = "white") +
coord_fixed(1.3) +
theme_void()
ggplot() +
geom_polygon(data = map_data("italy"),
mapping = aes(x = long, y = lat, group = group),
fill = "dark gray", color = "white") +
coord_fixed(1.3) +
theme_void()
ggplot() +
geom_polygon(data = map_data("nz"),
mapping = aes(x = long, y = lat, group = group),
fill = "dark gray", color = "white") +
coord_fixed(1.3) +
theme_void()
ggplot() +
geom_polygon(data = map_data("japan"),
mapping = aes(x = long, y = lat, group = group),
fill = "light gray", color = "black") +
coord_fixed(1.3) +
theme_void()
ggplot() +
geom_polygon(data = map_data("county"),
aes(x = long, y = lat, group = group),
fill = "white", color = "light gray") +
geom_polygon(data = map_data("state"),
aes(x = long, y = lat, group = group),
fill = NA, color = "gray") +
coord_fixed(1.3) +
theme_void()
ggplot() +
geom_path(data = maps::map("china", plot = FALSE),
mapping = aes(x = long, y = lat, group = group),
color = "black") +
coord_fixed(1.3) +
theme_void()
We want to use the maps to show various variables. For instance, let’s see how to make a world map showing the variation in life expectancy.
If we look at the map_data("world")
we see it has the following structure:
head(map_data("world"))
The key to adding information to the map is to add variables to this dataset, such as the life expectancy variable, making sure you match the region name properly.
Let’s load the cross-section data from the Quality of Government Institute:
qog <- rio::import("http://www.qogdata.pol.gu.se/dataarchive/qog_std_cs_jan18.dta")
The variables in the dataset:
data.frame(
Variable = names(qog),
Description = sjlabelled::get_label(qog),
Obs. = map_dbl(qog, ~table(is.na(.x))[1]),
Missing = map_dbl(qog, ~table(is.na(.x))[2])
) %>%
DT::datatable(rownames = FALSE)
If you search for “life expectancy” you’ll find several variables. I’m going to use wdi_lifexp
which is from the World Bank.
The main difficulty in adding this data to the map is that the region
names in the map dataset do not match exactly the country names in the QoG data:
# which countries in qog don't match world map names properly
setdiff(qog$cname, map_data("world")$region)
## [1] "Antigua and Barbuda" "Congo"
## [3] "Congo, Democratic Republic" "Cyprus (1975-)"
## [5] "Ethiopia (1993-)" "France (1963-)"
## [7] "Cote d'Ivoire" "Korea, North"
## [9] "Korea, South" "Malaysia (1966-)"
## [11] "Pakistan (1971-)" "St Kitts and Nevis"
## [13] "St Lucia" "St Vincent and the Grenadines"
## [15] "Sudan (2012-)" "Trinidad and Tobago"
## [17] "Tuvalu" "United Kingdom"
## [19] "United States"
These are the country names that we need to correct.
Here are the names in the world map that don’t match the QoG properly:
# which world map names don't match countries in qog
setdiff(map_data("world")$region, qog$cname)
## [1] "Aruba" "Anguilla"
## [3] "American Samoa" "Antarctica"
## [5] "French Southern and Antarctic Lands" "Antigua"
## [7] "Barbuda" "Saint Barthelemy"
## [9] "Bermuda" "Ivory Coast"
## [11] "Democratic Republic of the Congo" "Republic of Congo"
## [13] "Cook Islands" "Curacao"
## [15] "Cayman Islands" "Cyprus"
## [17] "Canary Islands" "Ethiopia"
## [19] "Falkland Islands" "Reunion"
## [21] "Mayotte" "French Guiana"
## [23] "Martinique" "Guadeloupe"
## [25] "France" "Faroe Islands"
## [27] "UK" "Guernsey"
## [29] "Greenland" "Guam"
## [31] "Heard Island" "Isle of Man"
## [33] "Cocos Islands" "Christmas Island"
## [35] "Chagos Archipelago" "Jersey"
## [37] "Siachen Glacier" "Nevis"
## [39] "Saint Kitts" "South Korea"
## [41] "Kosovo" "Saint Lucia"
## [43] "Saint Martin" "Northern Mariana Islands"
## [45] "Montserrat" "Malaysia"
## [47] "New Caledonia" "Norfolk Island"
## [49] "Niue" "Bonaire"
## [51] "Sint Eustatius" "Saba"
## [53] "Pakistan" "Pitcairn Islands"
## [55] "Puerto Rico" "North Korea"
## [57] "Madeira Islands" "Azores"
## [59] "Palestine" "French Polynesia"
## [61] "Western Sahara" "Sudan"
## [63] "South Sandwich Islands" "South Georgia"
## [65] "Saint Helena" "Ascension Island"
## [67] "Saint Pierre and Miquelon" "Sint Maarten"
## [69] "Turks and Caicos Islands" "Trinidad"
## [71] "Tobago" "USA"
## [73] "Vatican" "Grenadines"
## [75] "Saint Vincent" "Virgin Islands"
## [77] "Wallis and Futuna"
We can see from this list what names should’ve been in the QoG data. For instance, instead of “United States” it should be “USA”.
The most straigthforward thing to do now is to just manually recode the country names in the QoG to match the names that the map uses. It’s a bit tedious, but only takes a few minutes. I’m creating a new variable called cname1
with the corrected country names:
# recode qog cnames
qog$cname1 <- qog$cname %>%
recode("Antigua and Barbuda" = "Antigua",
"Congo" = "Republic of Congo",
"Congo, Democratic Republic" = "Democratic Republic of the Congo",
"Cote d'Ivoire" = "Ivory Coast",
"Cyprus (1975-)" = "Cyprus",
"Ethiopia (1993-)" = "Ethiopia",
"France (1963-)" = "France",
"Korea, North" = "North Korea",
"Korea, South" = "South Korea",
"Malaysia (1966-)" = "Malaysia",
"Pakistan (1971-)" = "Pakistan",
"St Kitts and Nevis" = "Saint Kitts",
"St Lucia" = "Saint Lucia",
"St Vincent and the Grenadines" = "Saint Vincent",
"Sudan (2012-)" = "Sudan",
"Trinidad and Tobago" = "Trinidad",
"Tuvalu" = "Tuvalu", # not on map? or I can't find the proper name
"United Kingdom" = "UK",
"United States" = "USA")
We can now add the data to the map dataset, and plot it with ggplot
using the fill
variable to show the life expectancies:
# add life expectency to the map data
WorldMapData1 <- full_join(map_data("world"),
qog %>% select(cname1,
wdi_gdpcappppcur,
wdi_povgap190,
wdi_lifexp),
by = c("region" = "cname1"))
Make the map:
WorldMapData1 %>%
ggplot() +
geom_polygon(aes(x = long, y = lat, group = group, fill = wdi_gdpcappppcur)) +
coord_fixed(1.1) +
theme_void() +
labs(fill = "Average\nincomes", title = "GDP per capita, current dollars, PPP") +
scale_fill_binned(low = "tomato2", high = "dark blue",
trans = "log",
labels = scales::dollar_format(accuracy = 1),
breaks = c(1000, 2000, 10000, 30000, 60000)) +
ggthemes::theme_map(base_family = "serif", base_size = 18)
WorldMapData1 %>%
ggplot() +
geom_polygon(aes(x = long, y = lat, group = group, fill = wdi_lifexp)) +
coord_fixed(1.1) +
theme_void() +
labs(fill = "Years", title = "Life expectancy at birth") +
scale_fill_binned(low = "black", high = "light gray",
#trans = "log",
labels = scales::comma) +
ggthemes::theme_map(base_family = "serif", base_size = 18)
The alternative to matching the names by hand is to use the function maps::iso.expand()
which allows you to create the proper map names from standard ISO codes. The QoG contains the 3-letter ISO codes in the variable ccodealp
. (The maps::iso.alpha()
function does the reverse conversion, from map region names to the ISO codes.)
qog$cname2 <- iso.expand(qog$ccodealp)
This creates a complication though, because the ISO codes don’t have a 1-to-1 correspondence to region names on the map, and iso.expand()
creates regex expressions. As such, you need to do a fuzzy join when you’re merging the datasets:
# add life expectency to the map data
WorldMapData2 <- fuzzyjoin::regex_full_join(
map_data("world"),
qog %>% select(cname2, wdi_gdpcappppcur),
by = c("region" = "cname2"))
Here’s the same map again:
WorldMapData2 %>%
ggplot() +
geom_polygon(aes(x = long, y = lat, group = group, fill = wdi_gdpcappppcur)) +
coord_fixed(1.1) +
theme_void() +
labs(fill = "Life\nexpectancy\nat birth") +
scale_fill_gradient(low = "red", high = "dark blue", trans = "log")
Of course, once you matched the country names (by one method or another), you can add any other variables from the QoG. Here’s economic freedom from Fraser Institute, democracy from Freedom House, and liberal democracy from the Varieties of Democracy dataset:
# add variables to the map data
WorldMapData <- full_join(
map_data("world"),
qog %>% select(cname1, wdi_lifexp,
fi_index, vdem_libdem, fh_polity2),
by = c("region" = "cname1")
)
I’m also illustrating different color scales, there are basically endless visual possibilities.
WorldMapData %>%
ggplot() +
geom_polygon(aes(x = long, y = lat, group = group, fill = fi_index)) +
coord_fixed(1.1) +
theme_void() +
labs(fill = "Economic\nfreedom") +
scale_fill_viridis_c()
WorldMapData %>%
ggplot() +
geom_polygon(aes(x = long, y = lat, group = group, fill = fh_polity2)) +
coord_fixed(1.1) +
theme_void() +
labs(fill = "Democracy") +
see::scale_fill_pizza_c() # gradient(low = "red", high = "blue")
WorldMapData %>%
ggplot() +
geom_polygon(aes(x = long, y = lat, group = group, fill = vdem_libdem)) +
coord_fixed(1.1) +
theme_void() +
labs(fill = "Liberal\ndemocracy") +
ggthemes::scale_fill_continuous_tableau()
Let’s use the Sorens, Muedini and Ruger dataset of state policies.1
After you download the data and merge everything using Jason Sorens’ R script, you have the following variables for all states, from 1937 to 2017:
state_policy <- readRDS("data/state_policies.rds")
state_policy_meta <- readRDS("data/state_policies_meta.rds")
The available variables:
state_policy_meta %>%
select(Variable = `Variable Code`, Description = `Variable Name`) %>%
mutate(Variable = stringi::stri_enc_toascii(Variable),
Description = stringi::stri_enc_toascii(Description)) %>%
DT::datatable()
(I’m recoding the variables to ASCII to eliminate some special characters, which otherwise give an error in the DT::datatable()
.)
Before we add a variable to the map, let us first check the state names are ok. Check out the state names in the map data:
map_data("state")$region %>% unique()
## [1] "alabama" "arizona" "arkansas"
## [4] "california" "colorado" "connecticut"
## [7] "delaware" "district of columbia" "florida"
## [10] "georgia" "idaho" "illinois"
## [13] "indiana" "iowa" "kansas"
## [16] "kentucky" "louisiana" "maine"
## [19] "maryland" "massachusetts" "michigan"
## [22] "minnesota" "mississippi" "missouri"
## [25] "montana" "nebraska" "nevada"
## [28] "new hampshire" "new jersey" "new mexico"
## [31] "new york" "north carolina" "north dakota"
## [34] "ohio" "oklahoma" "oregon"
## [37] "pennsylvania" "rhode island" "south carolina"
## [40] "south dakota" "tennessee" "texas"
## [43] "utah" "vermont" "virginia"
## [46] "washington" "west virginia" "wisconsin"
## [49] "wyoming"
Notice that Alaska and Hawaii are missing, and the region names are all lower case. So let’s first change the names in the state_policy
dataset to lower case:
state_policy$State1 <- str_to_lower(state_policy$State)
Let’s double check that the state names match:
setdiff(state_policy$State1, map_data("state")$region)
## [1] "alaska" "hawaii" NA
It looks like they are all good! Alaska and Hawaii don’t appear on the map, because the map is only of the mainland.
NOTE: If the names didn’t match, the maps
package offers a handy rescue: the state.fips
dataset gives the US Census Bureau numeric identification for the states (as well as region, etc.). Similarly, if you’re plotting county-level data, there’s the county.fips
dataset with the US Census Bureau FIPS identification numbers for counties.
Let’s make maps of following variable:
Apovrate
: Percentage of state population in povertyApopdens
: Population density (Apop/Aland)cpbeerc
: State average beer prices, in constant 2008 dollarscspirwet
: Percentage of state population living in counties that are wet for distilled spiritsBopen
: Open carry of loaded handgun (permitted without permit=2, permitted with permit=1, not generally permitted=0)Add the variables to the map data:
StatesMapData <- full_join(
map_data("state"),
state_policy %>%
select(State1, Year,
Apovrate, Apopdens,
cpbeerc, cspirwet,
Bopen),
by = c("region" = "State1")
)
StatesMapData %>%
filter(Year >= 1980) %>%
filter(Year %% 5 == 0) %>%
drop_na(Year) %>%
ggplot() +
geom_polygon(aes(x = long, y = lat, group = group, fill = Apovrate)) +
coord_fixed(1.3) +
theme_void() +
labs(fill = "Poverty\nrate") +
scale_fill_gradient(low = "dark blue", high = "red") +
facet_wrap(~Year, ncol = 2) +
theme(legend.position = "bottom", legend.direction = "horizontal")
ggplot() +
geom_polygon(data = StatesMapData %>% filter(Year == 2015),
aes(x = long, y = lat, group = group, fill = Apopdens),
color = "light gray") +
coord_fixed(1.3) +
theme_void() +
labs(fill = "Population\ndensity") +
scale_fill_continuous(trans = "log", labels = scales::comma)
ggplot() +
geom_polygon(data = StatesMapData %>% filter(Year == 2003),
aes(x = long, y = lat, group = group, fill = cpbeerc),
color = "gray") +
coord_fixed(1.3) +
theme_void() +
labs(fill = "Beer\nprices") +
scale_fill_gradient(low = "dark blue", high = "red")
StatesMapData %>% filter(Year %in% c(1967, 1984)) %>%
ggplot() +
geom_polygon(aes(x = long, y = lat, group = group, fill = cspirwet),
color = "gray") +
coord_fixed(1.3) +
scale_fill_gradient(low = "red", high = "dark blue") +
labs(fill = "Wet counties:\n") +
theme_void() +
theme(legend.position = "bottom", legend.direction = "horizontal") +
facet_wrap(~Year)
StatesMapData %>%
filter(Year %% 5 == 0) %>%
drop_na(Bopen) %>%
mutate(Bopen = recode(Bopen,
`2` = "permitted without permit",
`1` = "permitted with permit",
`0` = "not generally permitted"
)) %>%
ggplot() +
geom_polygon(aes(x = long, y = lat, group = group, fill = Bopen),
color = "gray") +
coord_fixed(1.3) +
theme_void() +
labs(fill = "Open carry:") +
see::scale_fill_flat_d(reverse = TRUE) +
facet_wrap(~Year, ncol = 2) +
theme(legend.position = "top", legend.direction = "horizontal")
We can add things to the map in standard ggplot
fashion – using additional geoms. For example, let me manually add Tucson and Denver to the US map.
First, google the latitude and longitude of the cities and then add the information to a dataframe:
cities <- tribble(
~long, ~lat, ~name,
-110.9747, 32.2226, "Tucson",
-104.9903, 39.7392, "Denver"
)
Now we can just add them to the US map:
ggplot() +
geom_polygon(data = map_data("usa"),
mapping = aes(x = long, y = lat, group = group),
fill = "light gray", color = "light gray") +
geom_point(data = cities, aes(x = long, y = lat), color = "black") +
geom_text(data = cities, aes(x = long, y = lat, label = name), nudge_y = 1) +
coord_fixed(1.3) +
theme_void()
But we can do better than add the cities manually. The maps
package also contains a few dataframes of cities locations.
Dataset | Description |
---|---|
canada.cities |
Canadian cities larger than 1,000 |
us.cities |
US cities larger than 40,000 |
world.cities |
World cities larger than 40,000 |
Here is the structure of the world.cities
data frame:
head(world.cities)
Let’s add all US cities greater than half a million people to the map:
ggplot() +
geom_polygon(data = map_data("usa"),
aes(x = long, y = lat, group = group),
fill = "light gray", color = "light gray") +
geom_point(data = us.cities %>% filter(pop > 500000),
aes(x = long, y = lat)) +
geom_text_repel(data = us.cities %>% filter(pop > 500000),
aes(x = long, y = lat, label = name)) +
coord_fixed(1.3) +
theme_void()
Here are all world cities larger than one million, with labels for cities greater than 5 million:
ggplot() +
geom_polygon(data = map_data("world"),
aes(x = long, y = lat, group = group),
fill = "light gray", color = "white") +
geom_point(data = world.cities %>% filter(pop > 1000000),
aes(x = long, y = lat), color = "dark gray") +
geom_text_repel(data = world.cities %>% filter(pop > 5000000),
aes(x = long, y = lat, label = name), color = "black") +
coord_fixed(1.3) +
theme_void()
We can zoom in by specifying the limits of the map in the coord_fixed()
option. The high resolution mapHires
is useful particularly for zooming in, but I’m still using the regular map here. Here’s for example a zoom on China:
ggplot() +
geom_polygon(data = map_data("world"),
aes(x = long, y = lat, group = group),
fill = "light gray", color = "white") +
geom_point(data = world.cities %>% filter(pop > 500000, country.etc == "China"),
aes(x = long, y = lat), color = "dark gray") +
geom_text_repel(data = world.cities %>% filter(pop > 2000000, country.etc == "China"),
aes(x = long, y = lat, label = name), color = "black") +
coord_fixed(1.3, xlim = c(75, 135), ylim = c(20, 50)) +
theme_void()
I got the limits from this map. Notice that I’ve also changed what cities to show: I’m now showing all Chinese cities greater than half a million, and labeling all cities greater than 2 million.
Let’s plot the crime rates in various places, with data taken from the Open Crime Database. This data is even more precise than the county-level.
library(crimedata)
Get a small sample of the available crime data, using the “simple features” (sf) output option:
crime <- get_crime_data(type = "sample", quiet = TRUE, output = "sf")
This is how the data looks like:
head(crime)
Let us consider only the robberies and focus on Los Angeles:
crimeLA <- crime %>%
filter(offense_group == "robbery", city_name == "Los Angeles")
map_data("county") %>%
filter(subregion == "los angeles") %>%
ggplot() +
geom_polygon(aes(x = long, y = lat, group = group),
fill = "light gray", color = "white") +
geom_sf(data = crimeLA)
Make a map showing robberies density:
ggplot() +
geom_polygon(data = map_data("state"),
aes(x = long, y = lat, group = group),
color = "white", fill = "light gray") +
geom_count(data = crime %>% filter(offense_group == "robbery"),
aes(x = longitude, y = latitude),
alpha = 0.5) +
theme_void()
Sorens, Jason, Fait Muedini, and William P. Ruger. 2008. “State and Local Public Policies in 2006: A New Database.” State Politics and Policy Quarterly 8 (3): 309–26.↩︎