May 6, 2019

A short introduction to data linkage with dplyr

Data linkage with dplyr

Two datasets:

Particular matter: "pm" dataset
city pm_mg
Amman 999
Saltillo 869
Usak 814
The Bronx 284
Seoul 129
GPS coordinates of cities: "city_coords" dataset
city lng lat
Amman 31.9100 35.94800
Saltillo -100.9940 25.43400
Berlin 13.4020 52.52000
Usak 29.4030 38.67300
New York -73.9400 40.60000
Seoul 126.9838 37.53626

  • to combine these datasets, the "city" column might be used as identifier for matching
  • not all cities in "pm" also appear in "city_coords" and vice versa

Join operations

pm
city pm_mg
Amman 999
Saltillo 869
Usak 814
The Bronx 284
Seoul 129
city_coords
city lng lat
Amman 31.9100 35.94800
Saltillo -100.9940 25.43400
Berlin 13.4020 52.52000
Usak 29.4030 38.67300
New York -73.9400 40.60000
Seoul 126.9838 37.53626


Datasets can also be joined with merge, but I find dplyr easier to use.

left_join(a, b, by = <criterion>): always retains rows on the "left side" and fills up non-matching rows with NAs.

library(dplyr)
left_join(pm, city_coords, by = 'city')
##        city pm_mg       lng      lat
## 1     Amman   999   31.9100 35.94800
## 2  Saltillo   869 -100.9940 25.43400
## 3      Usak   814   29.4030 38.67300
## 4 The Bronx   284        NA       NA
## 5     Seoul   129  126.9838 37.53626

Join operations

pm
city pm_mg
Amman 999
Saltillo 869
Usak 814
The Bronx 284
Seoul 129
city_coords
city lng lat
Amman 31.9100 35.94800
Saltillo -100.9940 25.43400
Berlin 13.4020 52.52000
Usak 29.4030 38.67300
New York -73.9400 40.60000
Seoul 126.9838 37.53626


right_join(a, b, by = <criterion>): always retains rows on the "right side" and fills up non-matching rows with NAs. How many rows do you expect for a right join between pm and city_coords? How many of them will contain an NA value?

right_join(pm, city_coords, by = 'city')
##       city pm_mg       lng      lat
## 1    Amman   999   31.9100 35.94800
## 2 Saltillo   869 -100.9940 25.43400
## 3   Berlin    NA   13.4020 52.52000
## 4     Usak   814   29.4030 38.67300
## 5 New York    NA  -73.9400 40.60000
## 6    Seoul   129  126.9838 37.53626

Join operations

pm
city pm_mg
Amman 999
Saltillo 869
Usak 814
The Bronx 284
Seoul 129
city_coords
city lng lat
Amman 31.9100 35.94800
Saltillo -100.9940 25.43400
Berlin 13.4020 52.52000
Usak 29.4030 38.67300
New York -73.9400 40.60000
Seoul 126.9838 37.53626


inner_join(a, b, by = <criterion>): only retains rows that match on both sides.

How many rows do you expect for an inner join between pm and city_coords?

inner_join(pm, city_coords, by = 'city')
##       city pm_mg       lng      lat
## 1    Amman   999   31.9100 35.94800
## 2 Saltillo   869 -100.9940 25.43400
## 3     Usak   814   29.4030 38.67300
## 4    Seoul   129  126.9838 37.53626

Combining data and making a choropleth map

What is a choropleth map?

In a choropleth map, regions are shaded according to the measurement of some variable, e.g.:

The above picture is a good example of a bad color scheme.

Making a choropleth map

  • data downloaded from the Berlin Senate Dept. for Urban Dev. and Housing via FIS Broker
  • includes indicators for monitoring of social development ("Monitoring Soziale Stadtentwicklung 2017"):
bln_sozind <- read_sf('data/bln_plr_sozind.geojson')
head(bln_sozind)
## Simple feature collection with 6 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 386668.2 ymin: 5817761 xmax: 390764.3 ymax: 5820432
## epsg (SRID):    25833
## proj4string:    +proj=utm +zone=33 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## # A tibble: 6 x 11
##   PLANNAME EW2016 STATUS1 STATUS2 STATUS3 STATUS4 DYNAMO1 DYNAMO2 DYNAMO3
##   <chr>    <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Stülers… 3114      4.98    1.52    8.16   20.5    -0.19    0.14   -0.42
## 2 Großer … 192      NA      NA      NA      NA      NA      NA      NA   
## 3 Lützows… 5518      6.83    2.08   17.7    34.3    -0.5    -0.03   -0.8 
## 4 Körners… 4704      6.27    1.49   20.7    45.9    -1.33   -0.76   -1.66
## 5 Nördlic… 1135      2.06    0       1.5     3.55   -0.77   -0.74   -0.42
## 6 Wilhelm… 2190      3.64    0.89    8.45   27.3    -1.23   -0.59    0.13
## # … with 2 more variables: DYNAMO4 <dbl>, geometry <MULTIPOLYGON [m]>

Making a choropleth map

## Simple feature collection with 3 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 386668.2 ymin: 5817875 xmax: 389894.1 ymax: 5820432
## epsg (SRID):    25833
## proj4string:    +proj=utm +zone=33 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## # A tibble: 3 x 3
##   PLANNAME      STATUS4                                            geometry
##   <chr>           <dbl>                                  <MULTIPOLYGON [m]>
## 1 Stülerstraße     20.5 (((387256.6 5818552, 387258.6 5818545, 387261.3 58…
## 2 Großer Tierg…    NA   (((386767.5 5819393, 386767.3 5819393, 386767.1 58…
## 3 Lützowstraße     34.3 (((387952.6 5818275, 387974.2 5818265, 387994.8 58…
  • source: FIS Broker, "Monitoring Soziale Stadtentwicklung 2017"¹
  • aim: visualize spatial distrib. of variable STATUS4 – child poverty rate in percent²
  • dataset is already a spatial dataset containing geo-data in geometry column

¹ Note that the FIS-Broker website is kind of awkward to use and you can't link directly to dataset, but you have to search for the mentioned keywords.
² Portion of children under 15 living in household that obtains social support ("Hartz IV")

Making a choropleth map

Directly plotting the map with fill color depending on STATUS4:

ggplot() + geom_sf(aes(fill = STATUS4), data = bln_sozind)

Problem: STATUS4 is continuous and so is the color scale. For choropleth maps, discrete ranges are better for discerning the colors.

Making a choropleth map

Put the percentages into 5 discrete bins, change the color palette, adjust the appearance:

bln_sozind$child_pov_bins <- cut(bln_sozind$STATUS4, seq(0, 100, by = 20))
ggplot() + geom_sf(aes(fill = child_pov_bins), data = bln_sozind) +
    scale_fill_brewer(palette = 'OrRd', na.value = "grey90",
                      guide = guide_legend(title = 'Child pov.\nranges in pct.')) +
    coord_sf(datum = NA) +  # disable lines ("graticule") in the background
    theme_void()

Combining data

Most of the time, you have at least two datasets: one containing the measurement(s) you want to show, the other containing the geo-data.

Example dataset: People at risk of poverty or social exclusion by NUTS 2 regions (tgs00107) from Eurostat.

pov_risk <- read.csv('data/tgs00107_pov_risk_nuts2.csv',
                     stringsAsFactors = FALSE)
pov_risk$risk_pct_bins <- cut(pov_risk$risk_pct, seq(0, 100, by = 10))

pov_risk_2016 <- filter(pov_risk, year == 2016)   # 2016 has fewest NAs
head(pov_risk_2016)
##   nuts year risk_pct risk_pct_bins
## 1   AT 2016     18.0       (10,20]
## 2 AT11 2016     15.1       (10,20]
## 3 AT12 2016     13.0       (10,20]
## 4 AT13 2016     26.0       (20,30]
## 5 AT21 2016     16.3       (10,20]
## 6 AT22 2016     18.4       (10,20]

Combining data

We load the GeoJSON dataset for NUTS level-2 regions provided by Eurostat:

nutsrg2 <- read_sf('data/nutsrg_2.json')
st_crs(nutsrg2) <- 3857  # set the correct CRS
head(nutsrg2)
## Simple feature collection with 6 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 1347007 ymin: 4105701 xmax: 3847132 ymax: 6631069
## epsg (SRID):    3857
## proj4string:    +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs
## # A tibble: 6 x 3
##   id    na                                                         geometry
##   <chr> <chr>                                            <MULTIPOLYGON [m]>
## 1 CY00  Κύπρος      (((3592720 4172836, 3594320 4178301, 3607520 4169324, …
## 2 CZ01  Praha       (((1588619 6463231, 1592220 6468696, 1617821 6476893, …
## 3 CZ02  Střední Če… (((1686224 6537392, 1685024 6526853, 1686624 6506167, …
## 4 CZ03  Jihozápad   (((1492615 6461670, 1536217 6432006, 1539017 6395706, …
## 5 CZ04  Severozápad (((1612621 6534270, 1600220 6515144, 1520216 6485089, …
## 6 CZ05  Severových… (((1729426 6582279, 1758628 6576034, 1786629 6556127, …

Combining data

Both datasets contain a NUTS level-2 identifier so we can join them:

pov_risk_2016_geo <- left_join(nutsrg2, pov_risk_2016, by = c('id' = 'nuts'))
head(pov_risk_2016_geo)
## Simple feature collection with 6 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 1347007 ymin: 4105701 xmax: 3847132 ymax: 6631069
## epsg (SRID):    3857
## proj4string:    +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs
## # A tibble: 6 x 6
##   id    na      year risk_pct risk_pct_bins                        geometry
##   <chr> <chr>  <int>    <dbl> <fct>                      <MULTIPOLYGON [m]>
## 1 CY00  Κύπρος    NA     NA   <NA>          (((3592720 4172836, 3594320 41…
## 2 CZ01  Praha   2016     10.1 (10,20]       (((1588619 6463231, 1592220 64…
## 3 CZ02  Střed…  2016     10.3 (10,20]       (((1686224 6537392, 1685024 65…
## 4 CZ03  Jihoz…  2016     10   (0,10]        (((1492615 6461670, 1536217 64…
## 5 CZ04  Sever…  2016     19.5 (10,20]       (((1612621 6534270, 1600220 65…
## 6 CZ05  Sever…  2016     11.7 (10,20]       (((1729426 6582279, 1758628 65…

Plotting the combined data

(eu_pov_plt <- ggplot() + geom_sf(aes(fill = risk_pct_bins),
                                  data = pov_risk_2016_geo, size = 0.1) +
  scale_fill_brewer(palette = 'OrRd', na.value = "grey90", guide =
                        guide_legend(title = 'Pct. of people at\nrisk of poverty')) +
  labs(title = 'Poverty in the EU 2016',
       subtitle = 'Percentage of people at risk of poverty or social exclusion',
       caption = 'source: Eurostat') +
  coord_sf(datum = NA) +  # disable lines ("graticule") in the background
  theme_void())

Plotting the combined data

Exercises 4 and 5

Choose one of the following two tasks (or make both if you like) and have a look at the handout for further hints:

4. A choropleth map of social indicators in Berlin

  • load the CSV file bln_plr_sozind_data.csv from folder 4_berlin_sozind
  • load the spatial dataset bln_plr.geojson containing the Berlin "Planungsraum" regions
  • join both datasets so that you obtain a combined spatial dataset with the social indicators for their respective regions
  • make a simple choropleth map with a social indicator of your choice
  • make choropleth map of a variable with discrete bins from a social indicator

5. A choropleth map for unemployment rates in the EU

  • load the CSV file tgs00010_unempl_nuts2.csv from folder 5_eu_unempl
  • create a subset with only the data from 2016 in it and create a variable with discrete bins for unempl_pct
  • load the spatial dataset nutsrg_2_2016_epsg3857_20M.json containing the EU NUTS level-2 regions
  • join both datasets so that you obtain a combined spatial dataset with the unemployment rates from tgs00010_unempl_nuts2.csv for their respective regions in nutsrg_2_2016_epsg3857_20M.json
  • make a choropleth map for the total unemployment rate (sex == 'T')
  • optional: make small multiples (via "facetting") by variable sex, i.e. displaying choropleth maps for the unemployment rate of women and men next to each other

Visualizing spatio-temporal data

If you also have a time-dimension in your data that you want to show, there are several options:

  • using small multiples via ggplot facetting
  • using animations via gganimate
  • making an interactive map (e.g. with a "time slider")

Visualizing spatio-temporal data

Small multiples: Dataset with time-related variable; plotted with facets (e.g. facet_wrap())

Visualizing spatio-temporal data

Animations: Use gganimate for time-lapse plot animations.

Visualizing spatio-temporal data

Geocoding and reverse geocoding via Google Maps API

What is geocoding?

Geocoding is the process of finding the geographic coordinates (longitude, latitude) for a specific query term (a full address, a postal code, a city name etc.).

  • Reichpietschufer 50, 10785 Berlin13.36509, 52.50640
  • cute cat cafe new york-73.896916, 40.706934

Reverse geocoding tries to map a set of geographic coordinates to a place name / address.

  • 13.36509, 52.50640Reichpietschufer 50, 10785 Berlin

Getting access

The Google Maps API is a web service that allows programmatic geocoding or reverse geocoding (among other things). Put simply, this means:

  • you send a list of queries (e.g. addresses) to a Google server
  • the server responds with a list of geo-coordinates

As of June 2018, the Maps API is part of Google's "Could Platform". This requires you to have:

  1. A Google account (surprise!).
  2. Start a Google Cloud Platform (GCP) free trial (valid for one year).
  3. Register for billing (they want your credit card).
    They promise not to charge it after the end of the free trial…

Inside GCP, you can go to APIs & Services > Credentials to get your API key.

Using ggmap for geocoding

You need to install the package ggmap, at least of version 2.7.

library(ggmap)

# provide the Google Cloud API key here:
register_google(key = google_cloud_api_key)

places <- c('Berlin', 'Leipzig', '10317, Deutschland',
            'Reichpietschufer 50, 10785 Berlin')
place_coords <- geocode(places) %>% mutate(place = places)
place_coords
## # A tibble: 4 x 3
##     lon   lat place                            
##   <dbl> <dbl> <chr>                            
## 1  13.4  52.5 Berlin                           
## 2  12.4  51.3 Leipzig                          
## 3  13.5  52.5 10317, Deutschland               
## 4  13.4  52.5 Reichpietschufer 50, 10785 Berlin

Reverse geocoding

Take the WZB's geo-coordinates and see if we can find the address to it:

# first longitude, then latitude
revgeocode(c(13.36509, 52.50640))
## [1] "Reichpietschufer 50, 10785 Berlin, Germany"

Social media posts or digital photos often contain geo-coordinates that you may want to "reverse geocode".

Making your maps publication-ready

Map projections

Can you unroll a cylinder or a cone to a sheet without tearing or stretching?

Yes, easily:

cylinder cone
source: rhino3.de

Map projections

Spheres or ellipsoids cannot be unrolled or unfolded without cutting or stretching. They are "non-developable surfaces".

Coordinate reference systems (again) and projections

Map projections try to do the impossible: project points on a sphere¹ to a plane.

  • we already worked with WGS84 coordinates, which are locations on a sphere¹ defined as degrees
    • -73.94° is the longitude: about 74° West from the meridian
    • 40.6° is the latitude: about 41° North from the equator
  • points from spherical/ellipsoidal coordinate system are converted to a "flat surface" cartesian coordinate system via a map projection
  • a coordinate reference system (CRS) can define a specific map projection

¹ WGS84 actually models the Earth as ellipsoid

Map projections

All map projections distort: area, shape, direction, distance and other properties can be different than on the actual sphere.

There are numerous projections, each with different properties and trade-offs between distortions.

Example projections

Mercator projection

  • very popular
  • strong distortions when far away from equator (Scandinavia, Antarctica, etc.)

Example projections

Example projections

Goode homolosine projection

  • equal-area projection
  • no distortion around equator
  • looks a bit like the peeled orange

Why should I even care?

  • spatial datasets usually specify the CRS → you should be able to set or transform it

  • you may want to use a different projection, because:
    • a different projection gives less distortion in the area of your interest (many historical projections are Euro-centric)
    • inequal area projections give too much visual weight on certain regions (i.e. Northern and Southern regions in Mercator projection)

  • compatibility between datasets matters (e.g. country shapes in specific CRS, but geo-coded points as WGS84 coordinates)

Why should I even care?

  • ETRS89 projection is especially for (central) Europe: less distorted here
  • look at the axis: units are meters, not degrees!

Finding out the CRS

The CRS of a spatial dataset is usually documented as EPSG / SRID number. epsg.io provides information about all common CRS.
When you load a spatial dataset you can see its CRS / projection:

worldmap_data <- st_as_sf(map('world', plot = FALSE, fill = TRUE))
head(worldmap_data, n=2)
## Simple feature collection with 2 features and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 19.28066 ymin: 29.39194 xmax: 74.89131 ymax: 42.64795
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##                         geometry          ID
## 1 MULTIPOLYGON (((74.89131 37... Afghanistan
## 2 MULTIPOLYGON (((20.06396 42...     Albania

Alternatively, use st_crs():

st_crs(worldmap_data)
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"

Setting a CRS

Often, the CRS is not set in the spatial dataset, so you have to do it manually:

bln_plan <- read_sf('data/Planungsraum_EPSG_25833.shp')
head(bln_plan, n=2)
## Simple feature collection with 2 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 386668.2 ymin: 5818275 xmax: 389894.1 ymax: 5820432
## epsg (SRID):    NA
## proj4string:    NA
## # A tibble: 2 x 3
##   SCHLUESSEL PLR_NAME                                              geometry
##   <chr>      <chr>                                           <MULTIPOLYGON>
## 1 01011101   "St\xfclerstr." (((387256.6 5818552, 387323.1 5818572, 387418…
## 2 01011102   "Gro\xdfer Tie… (((386767.5 5819393, 386768.3 5819389, 386769…

The documentation (and the file name) say it's EPSG 25833, a CRS known as ETRS89:

st_crs(bln_plan) <- 25833

Setting a CRS

Note that by default, ggplot displays lines ("graticule") and axes according to WGS84 coordinates, although the actual data uses a different CRS:

ggplot() + geom_sf(data = bln_plan)

Setting a CRS

Setting the datum to the same CRS in coord_sf shows axis with the same units as in the data:

ggplot() + geom_sf(data = bln_plan) + coord_sf(datum = 25833)

Transforming the CRS

Suppose we wanted to make some markers on the Berlin map. We obtained longitude/latitude coordinates for them, e.g. for the WZB:

wzb_coord <- data.frame(lng = 13.365097, lat = 52.506459)
ggplot() + geom_sf(data = bln_plan) +
    geom_point(aes(x = lng, y = lat), data = wzb_coord, color = 'red')

Transforming the CRS

Oops! What happened here?

The spatial data for Berlin has a different CRS, it doesn't use WGS84 longitude/latitude coordinates. We have to convert the WZB coordinates to the same CRS (EPSG 25833 aka ETRS89) via st_transform():

wzb_wgs84 <- st_sfc(st_point(c(wzb_coord$lng, wzb_coord$lat)), crs = 4326)
wzb_etrs89 <- as.data.frame(st_coordinates(st_transform(wzb_wgs84, crs = 25833)))
ggplot() + geom_sf(data = bln_plan) +
    geom_point(aes(x = X, y = Y), data = wzb_etrs89, color = 'red')

Setting the display window

You may only want to show a portion of your spatial data, i.e. "zoom in" to a certain area on the map.

We use spatial data for all countries from the rnaturalearth package:

worldmap <- ne_countries(type = 'map_units', returnclass = 'sf')

Using a subset of the data

One way to "zoom in" is to form a subset of your data, e.g:

europe <- worldmap[worldmap$continent == 'Europe',]
ggplot() + geom_sf(data = europe)

However, this has some disadvantages:

  • you probably wouldn't want to show whole Russia
  • nearby countries are not shown (e.g. Turkey, Northern Africa)

Limiting the display window

We can also use the whole dataset, but set limits on the display window via coord_sf() using the current CRS unit (here: degrees longitude/latitude):

  • xlim are the bounds for the x-axis
  • ylim are the bounds for the y-axis

Using the default Mercator-like projection:

ggplot() + geom_sf(data = worldmap) +
    coord_sf(xlim = c(-20, 45), ylim = c(30, 73))

Cutting the shapes

We can also "cut" the shapes to fit into the specified display window using st_crop:

To better see the effect, I'm showing the result in ETRS89 projection after the shapes have been cut:

europe <- st_crop(worldmap, xmin = -20, xmax = 45, ymin = 30, ymax = 73)
ggplot() + geom_sf(data = europe) + coord_sf(crs = 25833)

Finding centroids of regions

Sometimes you need to find the center point (centroid) of certain regions. Example: you want to annotate a map with labels for certain countries. You can't directly label a country, because it is represented as a polygon but you need a discrete point for labelling. You can find the centroid for a shape with st_centroid.

  • st_centroid doesn't work correctly with longitude/latitude → need to convert to a CRS that uses meters, such as ETRS89
  • we select 5 countries
  • we calculate the centroid points from the countries' geometry and get their coordinates (they are also in ETRS89)
world_etrs89 <- st_transform(worldmap, crs = 25833)
countries <- world_etrs89[world_etrs89$iso_a2 %in% c('SK', 'FR', 'TR', 'PL', 'CZ'),]
country_centroids <- st_coordinates(st_centroid(countries$geometry))
country_centroids
##           X       Y
## 1  797648.2 5784802
## 2 2264329.9 4535907
## 3 -462849.1 5225447
## 4  830884.1 5407641
## 5  524732.7 5514198

Labelling regions

We can use those coordinates for labelling the countries. First we combine the country names and their centroid coordinates:

country_labels <- cbind(select(countries, name), country_centroids)
country_labels
## Simple feature collection with 5 features and 3 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -951869 ymin: 4101650 xmax: 3171811 ymax: 6081423
## epsg (SRID):    25833
## proj4string:    +proj=utm +zone=33 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
##         name         X       Y                       geometry
## 114   Poland  797648.2 5784802 MULTIPOLYGON (((1056668 600...
## 125   Turkey 2264329.9 4535907 MULTIPOLYGON (((3171811 455...
## 144   France -462849.1 5225447 MULTIPOLYGON (((-138250.4 5...
## 156 Slovakia  830884.1 5407641 MULTIPOLYGON (((1051642 546...
## 157  Czechia  524732.7 5514198 MULTIPOLYGON (((501189.9 56...

Labelling regions

Now we can use geom_sf_label to display the labels. We pass the country_labels dataset and map its name variable to the label.
Note how I define a display window in WGS84 longitudes/latitudes and convert it to ETRS89, as this is the CRS of the spatial datasets that are plotted.

disp_window <- st_sfc(st_point(c(-12, 28)), st_point(c(60, 64)), crs = 4326)
disp_window_etrs89 <- st_transform(disp_window, crs = 25833)
disp_window_etrs89_coord <- st_coordinates(disp_window_etrs89)
ggplot() + geom_sf(data = world_etrs89) +
    geom_sf_label(aes(label = name), data = country_labels) +
    coord_sf(xlim = disp_window_etrs89_coord[,'X'],
             ylim = disp_window_etrs89_coord[,'Y'], datum = NA) +
    theme_void()

Labelling regions without occlusion

Labels close to each other will cause occlusion. The package ggrepel contains a function geom_label_repel that tries to place labels without occlusion. We map the x, y and label with the respective columns from the country_labels dataset:

library(ggrepel)
ggplot() + geom_sf(data = world_etrs89) +
    geom_label_repel(aes(x = X, y = Y, label = name),
                     data = country_labels, min.segment.length = 0) +
    coord_sf(xlim = disp_window_etrs89_coord[,'X'],
             ylim = disp_window_etrs89_coord[,'Y'], datum = NA) +
    theme_void()

Exercise 6

Take one of your previous plots and improve it by:

  • choosing a different projection
  • choosing a different display window (i.e. "zooming in" to a certain region)
  • selecting certain regions (e.g. those with extreme values) and displaying labels for them

Further reading