Resources

Josiah Parry | Exploratory Spatial Data Analysis in the tidyverse | RStudio (2022)

R has come quite a long way to enable spatial analysis over the past few years. Packages such as sf have made spatial analysis and mapping easier for many. However, adoption of R for spatial statistics and econometrics has been limited. Many spatial analysts, researchers, and practitioners lean on Python libraries such as pysal. In this talk I briefly discuss my journey through spatial analysis and introduce a new package sfdep which provides a tidy interface to spatial statistics and noteably exploratory spatial data analysis. sfdep is an interface to the spdep package as well as implements other common exploratory spatial statistics. Talk materials are available at https://github.com/rstudio/rstudio-conf/blob/master/2022/josiahparry/rstudio__conf(2022L)%20-%20Josiah%20Parry.pdf Session: Lightning Talks

Oct 24, 2022
5 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Today I'm talking about Exploratory Spatial Data Analysis in the tidyverse, so to start let's talk about Exploratory Data Analysis. I'm sure most of you are familiar with it, but good to recap. So, Exploratory Data Analysis, or EDA, is an approach-slash-philosophy for data analysis that employs a variety of techniques, mostly graphical to, one, maximize insight into a dataset, discover underlying structure, extract variables, detect outliers, and anomalies.

One paper, for example, described EDA as an attempt to discover potentially explicable patterns. EDA, as we might know, has an emphasis on data visualization, a use of descriptive statistics, and it really emphasizes on discovering outliers in your dataset. Here are two wonderful graphics from the TidyTuesday. This is how I'm guessing most of us are familiar with EDA itself.

Exploratory Spatial Data Analysis

Exploratory Spatial Data Analysis takes the same concepts and applies it to spatial data. To understand that, we have to start with the first law of geography, which states that everything is related to everything else, but near things are more related than distant things. So, in essence, ESDA extends exploratory data analysis to spatial relationship. It asks questions like, are things randomly distributed in space? Are there spatial outliers? Are close things more similar than things that are further away?

So, where EDA compares a part to the whole, ESDA compares a part to its neighbouring parts. So, in ESDA, or Exploratory Spatial Data Analysis, we evaluate a location relative to its neighbouring location, not the whole or group, right?

So, where EDA compares a part to the whole, ESDA compares a part to its neighbouring parts. So, in ESDA, or Exploratory Spatial Data Analysis, we evaluate a location relative to its neighbouring location, not the whole or group, right?

From Python to R: the sfdep package

So, I learned Exploratory Spatial Data Analysis in grad school, and I learned it through Python and libraries like Geopandas, Shapely, and PySAL, but it was Python, and I really wanted to use RStats. So, around 2021, I wanted to get back into this, and I discovered the R package, spdep, so it's a package for spatial dependence, weighting, and statistics. So, some things about spdep. It was first released in 2002, and it was initially designed for SP objects, which was the first way to interact with spatial data in R, and I'm not sure if any of you have used it. Interesting. And it covers most of the spatial econometrics, so statistics, weights, neighbours, and things like that.

But in 2002, I was six, and I essentially grew up with the tidyverse. So, when I use spdep, I have a perceived paradigm gap, where the way that spdep functions is different than the way I think about R. So, naturally, when I couldn't make a neighbour list in a mutate function call, I created an issue, and unfortunately, that issue went nowhere. So, my response was to create a tidy interface called sfdep. So, sfdep is an extension to spdep for spatial dependence. And some principles are that I always want to use sf objects for geometry, because that's what I'm familiar with, and most people might be. Always return dplyr-friendly objects, so lists, data frames, factors, and additionally, functionality shouldn't be dependent upon dplyr for those who don't want dplyr. And then, additionally, all functionality that I have additional functionality, is implemented using spdep objects for backwards compatibility.

Working with neighbours and spatial weights

And then, so, fundamentally, with exploratory spatial data analysis, we need to understand our neighbours. And to do that, we'll use the stcontiguity function, which takes an sfc class object, or the geometry column of an sf object, and returns a list for identifying your neighbours. Here's an example of that using dplyr-friendly code. And the result is a list, where each element is an integer vector, and the elements contain row positions of its neighbours.

And we're not just limited to identifying contiguity for polygons, we can implement finding k-nearest neighbours, distance bands, so on and so forth. And additionally, if you want to use spatial weights, we just pass in the neighbour object, the spatial weights, and we get row standardised by default. So, here's an example of doing all that in one call with mutate.

Again, we're not just limited to that. Let's compare the syntax. So, here's a map of comparing clusters of crime in 1830s France. Here's the code for using spdep. There's no pipe operator and lots of assignments. Here's the same result using sfdep. One assignment and two pipe operators. So, here's the same map. Same different code, same result.

So, sfdep doesn't just act as an interface, it extends it as well. So, we can do local join counts, the first open source implementation of co-location quotients, and also implements Esri's emerging hotspot analysis. It also integrates with SF networks. So, what's next? User validation, ensure parity with Python, and then make a Hex logo one day. So, give it a star and give it a go.