Gabriel Morrison - Open Source Software in Action: Expanding the Spatial Equity Data Tool
The Urban Institute’s Spatial Equity Data Tool enables users to upload their own data and quickly assess whether place-based programs and resources—such as libraries or Wi-Fi hotspots—are equitably distributed across neighborhoods and demographic groups. Our (forthcoming) API and R package also enable users to seamlessly incorporate equity analytics into existing workflows and exciting new tools. In this talk, I will share how we've expanded access to the tool using multi-language software. I'll discuss our updates to Python-based tool and API; R package wrapping the API; and Quarto-based documentation. I will also share how our partners in the City of Los Angeles have used the API and RShiny to build a custom budget equity tool. Talk by Gabriel Morrison Slides: https://urbanorg.box.com/s/vecuwuwhtj1zkha09nznq9p0qxdttl9r GitHub Repo: https://github.com/UrbanInstitute/ui-equity-tool
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, everybody. Thank you all so much for being here, especially after lunch. Thanks for being here on time. I'm excited to be presenting to you all today.
So in 2018, I was a student at the University of Chicago, which is a picture of this, and the UChicago Hospital opened or actually reopened their trauma center after decades of it being closed. So there was a lot of excitement and hype about this. So I heard about it kind of in the emails UChicago sent me, you read about in the news, both kind of at, at the university and then across Chicago as a whole. So it sounded like kind of, there was a lot of excitement and I was curious about this and wanted to learn more.
So one thing I did is figured out what a trauma center was or kind of what that meant, so I found out that it, that means that a trauma center is kind of a part of a hospital often in ER and it's equipped to provide surgery for traumatic injuries, so things like car crashes, like falls or burns. Or potentially things like SABR gunshot wounds. So another kind of logical question I had was, so where, where are trauma centers in Chicago and where were they before, before the opening of, of the UChicago trauma center?
So this is a map kind of showing that distribution before, before the opening. So one thing you may know here, or you may notice is on this South side of Chicago, there's kind of this dearth of trauma centers, which is potentially a concern and as a consequence of the legacy and kind of continued history of kind of segregation and discrimination and racism that the South side of Chicago faces, there's a high need for, or there was a high need for trauma centers. So you can see that here with the gunshot wounds and motor vehicle crashes patterning.
So this, this data comes from this really great article from Crane Chicago Business. If you want to learn more, I really recommend it. But kind of going through this process and really thinking about this really opened my eyes for me. And it was the first time I thought seriously about kind of space and access and what that meant kind of in a city, in a serious way. And I really kind of, it opened my eyes to this form of serious spatial thinking.
And so kind of as a consequence of this experience in my, in my time early on in the university, I decided to study geography. So shout out to any other geography majors. And then I ultimately took that thinking to the Urban Institute where I work on the spatial equity data tool. And this is the tool I want to talk to you all about today.
Introducing the Spatial Equity Data Tool
So kind of why at a high level, did we create the spatial equity data tool? So we've noticed in the past few years that there's been an increased demand across governments, across nonprofits and other organizations to reliably assess and measure spatial equity. And so to respond to that, my team developed the spatial equity tool and then I have since built on it for the last two years.
So at a high level, what does the tool do? So this is kind of the workflow we think about here. So you can upload to the tool any point vector spatial data. So what does that mean? It means anything that you can plot as a series of individual points on a map. That's the type of data the tool takes. So you upload it to the tool, the tool does a lot of work and then it returns to two sets of scores. It returns these demographic disparity scores and geographic disparity scores.
So what are those scores? So this is an example screenshot from our spatial equity data tool for demographic disparity scores. For the next two slides, we have examples from this New Orleans 311 call request data from 2014 to 2018. 311 is a service that many cities provide it. So as you may know, it sounds kind of like 911, but you would call 911 for an emergency. 311 is a service you could call for kind of non-emergencies. So say like your streetlight has gone out or you have a pothole that you need filled. 311 may be the way you could reach out to your city to to get that service.
OK, so that's what 311 is. What is this demographic disparity score? So what the demographic disparity score does is it looks across the geography as a whole and it makes a comparison. So we say, hey, looking at, in this case, the census tracks where these points or where these 311 call requests come from, how do the demographics of those areas compare to the city as a whole?
So how to interpret it? I'm sorry it's small, but the top line is white, non-Latinx residents. And so what this says is comparing kind of those census tracks where most of the data come from, those areas have disproportionately high numbers of white residents compared to the city as a whole. Conversely, you can see from the bottom line, which says black, non-Latinx residents, in the areas where 311 call requests came from in this time period, those areas were disproportionately underrepresented, or black residents were disproportionately underrepresented in those areas, at least in this data. So that's kind of how to read these demographic disparity scores. Again, zooming out, it's like across the geography, what demographic groups are over or underrepresented in their access to kind of the point resource data that's uploaded.
OK, so that's the first set of scores. The second set of scores are a geographic disparity score. So this looks at something different, whereas the demographic disparity score says, hey, let's look across the geography as a whole. The geographic disparity score says, I want to look at a specific subgeography. So for the city scale, the subgeography, again, is a census tract. And we do kind of simple arithmetic, a simple difference in proportions here. So we look at the proportion of the data points in a census tract that we upload and the proportion of a population. So if the proportion of data points is greater than the proportion of the population, that area receives a positive disparity score and it's colored in blue. And we might think of that as that area is being overrepresented by those points.
Conversely, if there are, if the proportion of data is less than the proportion of the population, we would say that that area gets a negative disparity score. So it's kind of the yellow and orange areas. And we might think that that area is underrepresented by the data.
OK, so that are geographic disparity scores. Now, you may ask, why do we like this? We think it brings a couple of key benefits. The first is it makes performing standardized equity analyses easier. Right. So you don't have to decide what what it means to have spatial equity. We've kind of like made definitions ourselves. You can upload your data and we just run the calculations. We think it's easy for you to compare across time where you could upload multiple years of data. You can compare across geographies. So we handle any geographies across the U.S. And it's fast, it's free and it's reproducible. So those are all kind of open source standards that we are excited to follow.
And it's fast, it's free and it's reproducible. So those are all kind of open source standards that we are excited to follow.
OK, so what are some use cases? How might how might you use this tool? So we think one key use case is this idea of measuring equity in the allocation of place-based programs. So suppose kind of you're interested in putting out a resource where this question is, would you want that resource? I don't know. Maybe where are you trying to put your new trauma center? But, you know, these examples to playgrounds, bus shelters, libraries, anything like that, you could kind of use those those charts that we made and and make a decision based on those. You also could use it to examine the representativeness of program participants. So so suppose you're you have a program, you know where the participants live and you're targeting that program at a broader population, you could use as a point resource the location of those program participants and then you could compare against that that population. And then lastly, kind of this is a similar logic. We think you could identify areas for future investment with the tool.
How you interact with the tool
OK, so those are use cases. Now you might be asking, I hope you're asking, how do you interact with the tool? So kind of the paradigm from the inception of the tool to March of twenty twenty four was this. So you, the user, would go to our website and if you search Spatial Reputee Data Tool, it'll be the first link. I also have a QR code at the end. But you go to our website, you drag and drop a CSV file, then kind of on the back end we have some AWS infrastructure that runs and then we return the results kind of to the website. The website visualizes those results and you can view them and download them.
As of March twenty twenty four, we have launched an application programming interface, API, so we're super excited about that. And in now, in addition, you can use our API to get those same calculations.
Identifying the right tool for the job
OK, so with that background, I kind of want to introduce the second phase of my talk and present this thesis to you, which is that we think identifying the right tool for the job can lead to unexpected gains. So throughout the development process, the Spatial Reputee Data Tool team and I have faced a number of technical problems. And then we've kind of thought hard about them and found these technical solutions to them. And we think because we found these solutions that work well, we've actually received some unexpected gains as a consequence of of using those solutions. So I'm really excited to share three examples of the problems we faced, the solution that we decided upon and why we decided upon it, and then these exciting, potentially unexpected gains, at least for us.
OK, so here we go. So the question number one, the problem we face is where do we run our code? And this is actually a problem that my colleagues faced before I joined the project. So kind of the starting point here is that we have some Python scripts and the Python kind of calculates the metrics that we're showing. But there's this question of kind of where where do you run this code? And these are the requirements that we're facing. So we need something to be scalable. So right. Hopefully, you know, after this presentation, everybody here is like, I got to go use this tool. So we have a high, a high demand. We also want it to be fast, right? This is targeted towards people who, you know, want to upload their data. They want to get the results. It's not like you want to email somebody and have them do the work. So we want something close to real time. And actually, our script is quite fast. So, you know, the the computing environment should be able to respond to that as well. And we also want something to be cheap, you know, for obvious reasons.
OK, so the solution we have here is AWS Lambda. So Lambda is a service that is offered by Amazon Web Services that provides serverless compute. So it kind of runs the code or the function you write automatically and it manages the compute resources behind the scene. So that's really helpful. You don't have to think about servers or anything like that. It's fast. So it automatically scales. Right. So as as demand comes in, it scales up. It kind of runs more. And that's really helpful for this kind of demand issue. And then it's cheap. You only are charged for what you use. And actually, the the rate is is quite affordable for us. So so that's really helpful.
One thing you may know about AWS Lambda, one potential limitation for use cases is that it's it's only runs for 15 minutes. After 15 minutes, it stops running. But this isn't a problem for us because our code is very fast. It takes like under 10 seconds. So that's kind of not a problem.
OK, so that was our problem. Here's our solution. What is our unexpected gain? So as we were expanding the Spatial Equity Data tool to respond to the API, we kind of had to expand the logic of the code. So we were kind of looking for ways where, hey, you know, if you run your old code, it works as well. But if we have these new inputs from kind of that look a little bit different or that have different structure, we need to use some logic to handle those. And so kind of the solution that we found are a second AWS service called Step Functions. So Step Functions are a service that allows you to create a series of tasks of steps and then orchestrate them. And really importantly, Lambda Functions can be those steps. And so part of the Step Function that we really appreciate is there is this control flow logic. So depending on the output of one function, you can do one thing or do another thing. So it's kind of just like an if-else branch, but kind of at a different scale. So that was really helpful for us to kind of handle requests either from the web interface or from the API. And our unexpected benefit here, at least for our team, when we kind of learned about and then implemented the solution, is that there's really elegant integration of Lambda Functions. So it's really easy for us to kind of incorporate that software we had. We don't need to make that much changes to the Lambda Function, but then we can just build on it in an easy way for as we added the new functionality.
Documenting the API with Quarto
So not to worry, this is an open source conference, so the next solutions will be open source and Posit themed. So now we have this question, how do you document our API? And actually, the answer may have been revealed yesterday, but nonetheless, we're doing it anyway. So the requirements here are we want to handle code, right? We have an API. We want to have code showing how to interact with the API. We also kind of APIs are these like it's hard to capture them or to show them. Right. So we want a home page where you can go, you know, that we can direct people to for the API. And then also we had really excellent documentation, but in the form of a PDF text and we wanted to kind of find a more updated home for that, that maybe it's a little bit easier to interact with on the web.
OK, so perhaps unsurprisingly, our solution here is a Quarto book. So, right. It handles code. You can do code integration in Quarto and HTML. It serves as a home page. We just have a chapter in our book that's about the chapter. And then, of course, it handles text really nicely.
So we have additional gains, too. Right. So as you may know, you can do chapter or section references. So you can say, hey, in chapter such and such, and then just link to a different chapter in the Quarto book. You can also do that kind of with different elements within chapters. There's also nice equation handling. Right. So you can use kind of the latex or equation syntax. And that works really well. And we have some documentation about that. And then lastly, I think it has a nice hierarchy. So we have a table of contents kind of both for the book and for the page.
So it looks like this. So you can kind of this is a chapter on our Spatial Equity Data Tool algorithm. So it's the step by step process of what the tool does. And it's nice. You can click on the chapters on the side. You can also see we have these underlying chapter, chapter six, for example. So you can click there and go there. And then within within the document, you can go to different subheads. So we think that's an easier way for folks to navigate, especially on the web.
Supporting power users with an R package
So that was kind of problem solution unexpected day number two. Lastly, we have this question. How do we support power user panelists? So we think we have or we hope we have users of the tool. We also at the Urban Institute use a lot of R. So we kind of have these questions where how can we make it easy for users to engage with the API and also in the R programming language?
So the answer is a R package pronounced setter. So we have wrapped each API endpoint, of which there are three in in our function. So you can kind of engage with it in a nice user friendly way with intuitive kind of defaults. And then we actually have a wrapper function around each of those functions. So you really can call the spatial equity data tool and then receive the results all with one function. So we're really excited about that as well.
OK, so what's our unexpected gain here? So the answer is we're kind of living this open source software dream. We released this in March and shortly thereafter, we got some pull requests opened by a really skilled R programmer from a planning office. So that was really exciting for a couple of reasons. The first of which was that just like some of his suggestions made the package itself better. And then the second thing is that he actually wrote some functionality for the tool to engage with ArcGIS feature layers.
So kind of zooming out a little bit, my background as a as a data scientist is in kind of open source software like R and Python. So I don't have a lot of experience with ArcGIS. However, it's really a key workhorse piece of software that folks in planning offices use. So we're really excited about this because, you know, we're integrating the spatial equity data tool with some software that planning offices who we hope use our tool kind of use very often. So it's kind of this seamless integration. And this really kind of encapsulated unexpected gains. This was really out of the blue. Just one day we get the alert that the pull request was open. So we were we were really excited about that.
So the answer is we're kind of living this open source software dream. We released this in March and shortly thereafter, we got some pull requests opened by a really skilled R programmer from a planning office.
Key takeaways
OK, so what should you take away with talk? First things first, if you're a person who has the ability to determine where new resources are allocated across different different geographies, we would encourage you to use this tool. We also think if you're not that person, but you know people who are, we would encourage you to share our tool. Again, it's free. It's it's open source and you can use it right now. The second thing is we encourage you to think deeply about what the right tool is for your job. So hopefully when you do that, as I've tried to show today, you too may experience unexpected gains in your work. So thank you very much. And I'm excited to hear your questions.
Q&A
All right, we'll leave this slide up for people who want to take a photo. There will be a Slido link for the folks in the audience who want to ask more questions. But the first question is, have you looked into other geographic boundaries like police fire zones or school districts? Yeah, so that's a really good question. One thing I didn't specifically mention, but the tool has kind of four geographic scales that it works on. So it works on the city, the county, the state and the country. So if you're working at any of those kind of geographic scales, the tool is ready to go. We haven't kind of worked at these other geographies. And with the other one that we hear that's very related is like, hey, what if we're a metro area? It's like a group of three or four counties. So we haven't done that. I would say kind of that is the next most exciting thing we're excited to develop. So hopefully stay tuned and we might have a better answer for you in the next year or two.
All right, next question. Who is your ideal customer for this data and how hard or easy is it to make changes and decisions with this data? So I think our ideal customer could be really, really anybody. And hopefully I wouldn't even say customer, right? Like they aren't paying to use this tool. But I think our user base, we're hopeful, you know, we could see a lot of people in city or municipal governments who are kind of deciding where to use the tool. So like if you're kind of approving funding requests for different places or you're kind of allocating kind of capital investment, things like that, we think that could be a really exciting use case.
All right, the last question. You wrote your documentation in as a Quarto book, but were there any decisions as to why you chose Quarto book versus a Quarto website? Yeah, that's a good question. I think I think it was more of eyespace. It's like this is a book kind of it felt like, you know, we see these different chapters and I don't know, I felt more appropriate for the documentation. I can't say maybe we should have thought harder about that, but it seemed it seemed right at the time. All right. So that's all the time we have for you. So let's thank Gabe for one more time.