class: center, middle, inverse, title-slide # The Spatial Ecosystem in R ## 🌎 🗺
a brief overview of new and useful packages ### Angela Li
@CivicAngela
Center for Spatial Data Science, UChicago
Slides available at
http://bit.ly/rspatial-ecosystem
### 2018-05-08 --- # Goals of this talk: ## 1. Showcase new spatial packages in R (`sf`!) ## 2. Demonstrate how to do tidy spatial analysis ## 3. Discuss future ways to build out the #rspatial ecosystem --- class: inverse, center, middle # My thesis gave me an excuse to try cutting-edge spatial packages --- # A map from my thesis  --- # And the code used to produce it ```r # Load packages library(tidyverse) library(sf) library(tmap) # Read in data sales <- read_csv("output/sales-tidy.csv") tracts <- st_read("data/orig/shapefiles/detroit_tracts.shp") tracts <- rename(tracts, tract = GEOID) # Join csv to shapefile sales <- sales %>% right_join(tracts, ., by = "tract") # Make map med_sales_map <- tm_shape(sales, unit = "mi") + tm_fill("med_price", palette = "Blues", breaks = quantile(a$med_price), title = "Median Sales Price") + tm_facets("after_hhf") + tm_shape(tracts) + tm_borders() + tm_compass(fontsize = 0.6, color.dark = "dark grey") + tm_scale_bar(color.dark = "dark grey") save_tmap(med_sales_map, "doc/figs/med_sales_map.png") ``` --- class: center, middle # Note the use of `sf` (What I'll be discussing today)  --- class: inverse, center, middle # The `sf` package revolutionizes how R stores spatial data --- class: inverse, center, middle # (`sf` stands for "simple features") -- ### It's based off of PostGIS! --- # Previously, R had `sp` spatial classes ```r str(elect80) # Data is from spData package - 1980 elections by county ``` ``` ## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots ## ..@ data :'data.frame': 3107 obs. of 5 variables: ## .. ..$ FIPS : chr [1:3107] "01001" "01003" "01005" "01007" ... ## .. ..$ pc_turnout : num [1:3107] 0.516 0.522 0.539 0.528 0.496 ... ## .. ..$ pc_college : num [1:3107] 0.484 0.506 0.38 0.336 0.389 ... ## .. ..$ pc_homeownership: num [1:3107] 0.38 0.392 0.363 0.369 0.402 ... ## .. ..$ pc_income : num [1:3107] 8.69 8.61 6.74 7.17 7.42 ... ## ..@ coords.nrs : int [1:2] 2 3 ## ..@ coords : num [1:3107, 1:2] -86.6 -87.8 -85.4 -87.1 -86.6 ... ## .. ..- attr(*, "dimnames")=List of 2 ## .. .. ..$ : NULL ## .. .. ..$ : chr [1:2] "long" "lat" ## ..@ bbox : num [1:2, 1:2] -124.2 25.1 -67.6 48.8 ## .. ..- attr(*, "dimnames")=List of 2 ## .. .. ..$ : chr [1:2] "long" "lat" ## .. .. ..$ : chr [1:2] "min" "max" ## ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot ## .. .. ..@ projargs: chr "+proj=longlat +datum=NAD27" ``` --- # Now, R has TIDY `sf` spatial classes ```r str(elect80_sf) ``` ``` ## Classes 'sf' and 'data.frame': 3107 obs. of 6 variables: ## $ FIPS : chr "01001" "01003" "01005" "01007" ... ## $ pc_turnout : num 0.516 0.522 0.539 0.528 0.496 ... ## $ pc_college : num 0.484 0.506 0.38 0.336 0.389 ... ## $ pc_homeownership: num 0.38 0.392 0.363 0.369 0.402 ... ## $ pc_income : num 8.69 8.61 6.74 7.17 7.42 ... ## $ geometry :sfc_POINT of length 3107; first list element: Classes 'XY', 'POINT', 'sfg' num [1:2] -86.6 32.5 ## - attr(*, "sf_column")= chr "geometry" ## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA ## ..- attr(*, "names")= chr "FIPS" "pc_turnout" "pc_college" "pc_homeownership" ... ``` --- # The difference? ```r class(elect80) ``` ``` ## [1] "SpatialPointsDataFrame" ## attr(,"package") ## [1] "sp" ``` ```r class(elect80_sf) ``` ``` ## [1] "sf" "data.frame" ``` --- # `sf` objects are `data.frame`s! ```r head(elect80_sf) ``` ``` ## Simple feature collection with 6 features and 5 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: -87.75474 ymin: 30.65488 xmax: -85.38899 ymax: 33.97974 ## epsg (SRID): 4267 ## proj4string: +proj=longlat +ellps=clrk66 +nadgrids=@conus,@alaska,@ntv2_0.gsb,@ntv1_can.dat +no_defs ## FIPS pc_turnout pc_college pc_homeownership pc_income ## 1 01001 0.5160463 0.4835339 0.3795130 8.688450 ## 2 01003 0.5215976 0.5055374 0.3915707 8.613157 ## 3 01005 0.5394989 0.3795852 0.3625629 6.742448 ## 4 01007 0.5277830 0.3360240 0.3692510 7.170828 ## 5 01009 0.4964495 0.3891533 0.4021849 7.415568 ## 6 01011 0.7974254 0.3414910 0.3546471 5.937332 ## geometry ## 1 POINT (-86.64147 32.54221) ## 2 POINT (-87.75474 30.65488) ## 3 POINT (-85.38899 31.86307) ## 4 POINT (-87.12686 32.99694) ## 5 POINT (-86.56621 33.97974) ## 6 POINT (-85.71779 32.10177) ``` --- # Which means... You can perform analysis on the `sf` object itself! ```r library(dplyr) # Which counties had over 50% turnout? turnout_over_half <- filter(elect80_sf, pc_turnout > 0.5) plot(st_geometry(turnout_over_half)) ``` <!-- --> --- ```r # Which counties had under 50% turnout? turnout_under_half <- filter(elect80_sf, pc_turnout < 0.5) plot(st_geometry(turnout_under_half)) ``` <!-- --> --- # Read/write is simple ```r # Read in shapefile of Chicago health data (from GeoDa site) chi <- st_read("data/ComArea_ACS14_f.shp") ``` ``` ## Reading layer `ComArea_ACS14_f' from data source `/Users/angela/Desktop/R-Projects/Teaching/rspatial-ecosystem/data/ComArea_ACS14_f.shp' using driver `ESRI Shapefile' ## Simple feature collection with 77 features and 86 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +ellps=WGS84 +no_defs ``` -- ```r # Calculate new attribute, population density chi2 <- mutate(chi, Pop2014Density = Pop2014/shape_area) ``` -- ```r # Write new shapefile st_write(chi2, "data/chicago_high_pop.shp", delete_layer = T) ``` ``` ## Warning in abbreviate_shapefile_names(obj): Field names abbreviated for ## ESRI Shapefile driver ``` ``` ## Deleting layer `chicago_high_pop' using driver `ESRI Shapefile' ## Writing layer `chicago_high_pop' to data source `/Users/angela/Desktop/R-Projects/Teaching/rspatial-ecosystem/data/chicago_high_pop.shp' using driver `ESRI Shapefile' ## features: 77 ## fields: 87 ## geometry type: Multi Polygon ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 110196097.138999999 ## of field shape_r of feature 14 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 103595001.174999997 ## of field shape_r of feature 16 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 109099414.688999996 ## of field shape_r of feature 18 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 100057566.700000003 ## of field shape_r of feature 22 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 100480876.502000004 ## of field shape_r of feature 23 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 127562904.597000003 ## of field shape_r of feature 24 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 199254203.426999986 ## of field shape_r of feature 25 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 158492466.55399999 ## of field shape_r of feature 28 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 127998297.866999999 ## of field shape_r of feature 31 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 121959105.469999999 ## of field shape_r of feature 35 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 134313706.729999989 ## of field shape_r of feature 44 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 303797059.660000026 ## of field shape_r of feature 47 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 145965741.437000006 ## of field shape_r of feature 51 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 117890778.429000005 ## of field shape_r of feature 52 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 134636963.254000008 ## of field shape_r of feature 58 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 135460337.208000004 ## of field shape_r of feature 68 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 105065353.601999998 ## of field shape_r of feature 69 not successfully written. Possibly due to ## too larger number with respect to field width ``` ``` ## Warning in CPL_write_ogr(obj, dsn, layer, driver, ## as.character(dataset_options), : GDAL Message 1: Value 371835607.686999977 ## of field shape_r of feature 74 not successfully written. Possibly due to ## too larger number with respect to field width ``` --- # There are also a bunch of wonderful commands wrapped into `sf`... - `st_transform()`, `st_crs()` - `st_centroid()`, `st_bbox()`, `st_convex_hull()` - `st_area()`, `st_length()`, `st_distance()` - `st_buffer()`, `st_intersection()`, `st_contains()`, `st_union()` - `st_join()` And more! The [function reference in the documentation](https://r-spatial.github.io/sf/reference/index.html) has it all. --- # Packages that support `sf`- tmap ```r library(tmap) tm_shape(world) + tm_polygons("pop") + tm_style_cobalt() ``` <!-- --> --- # Packages that support `sf` - leaflet ```r library(leaflet) leaflet(world) %>% addPolygons(weight = 1, fillColor = ~colorQuantile("YlOrRd", pop)(pop), highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE), popup = ~as.character(name_long)) ```
--- # Packages that (sort of) support `sf`- ggplot2 ```r library(ggplot2) ggplot() + geom_sf(data = world) ``` <!-- --> --- class: center, middle # But many packages still require `sp` objects... ```r elect80_sp <- as(elect80_sf, "Spatial") ``` --- # A few other key packages to check out - `tidycensus`: all census and ACS data from past several years, delivered directly into R - `tigris`: TIGERline shapefiles (boundaries, etc) - `osmdata`: OpenStreetMap data, delivered directly into R - `spData`: sample spatial datasets to play around with in all sorts of formats - `raster`: many functions for working with raster data - `opencage`: geocoding - `shiny`: the workhorse for R interactive webapps --- # Resources to get you started - Jakub Nowosad's [great slides on GIS with R](https://cdn.rawgit.com/Nowosad/gis_with_r_how_to_start/aea08f46/gis_with_r_start.html), off of which these slides are largely based - [Geocomputation with R](https://geocompr.robinlovelace.net/), an open-source spatial analysis textbook currently under development, and the best resource for this new `sf` stuff - [sf documentation](https://r-spatial.github.io/sf/) by Edzer Pebesma - A great [Datacamp intro course](https://www.datacamp.com/courses/spatial-analysis-in-r-with-sf-and-raster) on sf by Zev Ross (the first part is free) --- class: inverse, center, middle # What needs to be done to improve the #rspatial package ecosystem? --- # A few ideas... - More support for *tidy* spatial analysis - Better documentation on existing spatial packages - More tutorials and resources for all #rspatial packages - **Clear entry point to spatial analysis for beginners** --  --- class: inverse, center, middle # Which leads to my next project... --- # Building out a overview/clearinghouse of R for spatial analysis  --- class: inverse # Goal is to have a beginner-friendly website  --- class: center, middle # What would you find useful on this website? What types of resources would you like to see on this site? What would help you do your research better? **Contact me at @[CivicAngela](https://twitter.com/CivicAngela)**! --- class: center, middle # Thanks! Slides created via the R package [xaringan](https://github.com/yihui/xaringan) by [Yihui Xie](https://twitter.com/xieyihui?lang=en). Slides available at <font style="text-transform: lowercase;"><http://bit.ly/rspatial-ecosystem></font>. <br> Source code for these slides can be found on [my Github](https://github.com/angela-li/rspatial-ecosystem).