Jeff Leek | Data Science Training in Communities with Limited Technology Resources | Posit (2022)
I'm so excited to be back with the data science community at rstudio::conf! Data science is a field that touches nearly every part of our modern lives - from the shows we watch, to our jobs, to the economy, to sports and entertainment. Moreover, modern technologies like cloud computing and access to increasing volumes of data have opened the door to incredible new opportunities for those with data skill. RLadies, Bioconductor, Data Carpentry, and RStudio have also done an incredible job of creating a supportive community for these new data scientists to flourish. But not everyone gets equal access to the training that can pull them into this field. In this talk I will describe a collaborative effort to build DataTrail - a data science program created in partnership with local non-profits in Baltimore that seeks to spread awareness, knowledge, and opportunities into historically under-resourced and under-served communities. I will highlight the incredible work of the partners who made this training program possible, show examples of the achievements of DataTrail participants, discuss successes and challenges with the program, and talk about the rewards of being a part of community-based data science education. Finally, I will highlight opportunities to be a part of our DataTrail community or build a new DataTrail in yours. Session: Keynote
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Good afternoon everyone. I'm very excited to announce our last and final keynote, Jeff Leek. We are particularly lucky to get Jeff Leek as a keynote speaker right now because he's in the middle of a cross-country move to start a new position as VP and Chief Data Officer at the Fred Hutchinson Institute for Cancer Research.
So Jeff is famous for very many things, but two in particular that I wanted to talk about. The first is, along with Roger Peng and Brian Caffo, Jeff taught really like the first data science MOOC, which introduced R to literally millions of people, which is pretty amazing. The other thing I love about Jeff is that him and I disagree on a few things, and one of the things in particular, several years ago he wrote a blog post, Why I Don't Use ggplot2. But what I love about Jeff's writing is he articulates his position so clearly, like even as I was reading that article, I was like nodding along, yeah, yeah, that makes sense. So please join me in welcoming Jeff Leek.
Thanks very much, and thanks Hadley, I know you've been saving that up for a minute. I will say that I have become a ggplot2 convert since then, so yes, yes. I'm not even playing to the audience there. Yeah, so I'm very excited to be here today to talk to you about Datatrail, which is an effort that we've been running in Baltimore for the last four or five years to build inclusive community-based data science education, and I hope that you'll be excited to go back into your communities and also build programs and connect to your communities.
One thing I wanted to start off with, though, is I wanted to just say I'm so excited to be in a room with all of you. I've seen tons of people that I haven't seen for two and a half years. The very last conference I went to in January 2020, before this one, was our studio conference in San Francisco. I then got on an airplane, flew back to Baltimore, and the world ended. I know that all of us over the last two and a half years have gone through many, many things. They're both individual and collective, and I just want to express the joy that I feel to be back together with people and back with the R community, and especially this amazing community that the band formerly called RStudio has put together.
And so I'm really excited about that, and I just wanted to express my gratitude both for the invitation to speak here today and to the community that's been built, all of you, to coming out and sticking around for the last talk. Before I go any further, I also wanted to make sure I thanked the people that did all the real important work. This is the Datatrail core team. Ashley Johnson, Simone Sawyer, Ed Sabatino, Devon Person, and Ashley and Devon are actually here today, and if they could stand up and wave, they're somewhere in the audience. Anything good you hear is definitely because of them. Anything silly or misguided is due to me.
Jeff's personal journey and the opportunity gap
One of the things I wanted to talk a little bit about is my own journey to where I am here today. As Hadley mentioned, I'm in the middle of a move to take a big, exciting new job right near my family, which I'm really excited about, but I didn't start there as none of us did. I started off as just a nerdy kid from a very small town in Idaho. This is an embarrassing picture that usually other people show in a presentation, but how did that happen? How did I go from nerdy kid in Idaho to where I am today?
And I imagine we all had a similar kind of journey, right? We had to know that this was a field which some are just discovering now, some discovered a long time ago. We had to have access to the equipment and the training and the connections to be able to be in a room here today. But more importantly than that, I had to have help. This is not something that I did on my own. I had to be lifted up at various points in my career, from my undergrad advisor, Jim Powell, to my grad advisor, John Story, to Giovanni Parmigiani, who recruited me to Baltimore by saying, do you want to live near your wife, to Rafa Irizarry, who was my main faculty mentor in my early years, and many, many other people who have worked with me.
Just like me, I'm sure you feel the same way, that there are people that have helped you out in your life. And so I wanted to just take a minute, because we're all here together, to turn to your neighbor and maybe tell them about somebody that helped you along your way. We're going to just take a minute here, and I'd like you to tell somebody about one of your mentors, somebody who helped you out when you had a moment of need.
All right. I know that this conversation could last a long time, and I hope you'll continue that conversation later. And I hope it reminded you of somebody that helped you along the way, because the Datatrail program we're building is about helping other people.
And so one of the key things that we realize, that I think everybody recognizes, is that talent is equally distributed. So these are some young people from our second cohort of Datatrail scholars in East Baltimore. These are folks that grew up around there and are incredibly talented young people. And there's an important thing that even though the talent is equally distributed, opportunity isn't always. So this is data from the Economic Opportunity Atlas, and it says that if you were born and grew up in this neighborhood, the median family income when you get to age 34 is $18,000 a year. So that whole neighborhood, even though it's right next to a multi-billion dollar medical institution, doesn't have access to the same career opportunities that, say, I did when I grew up out in Idaho.
And it's not just Baltimore where this is true. If you look at the Economic Opportunity Atlas, where I'm going in Seattle, Pittsburgh, Chicago, Tampa Bay, New York City, no matter where you look, there are these opportunity deserts. And so how do we create opportunity in these opportunity deserts? How do we actively go out and promote opportunities for folks that grew up there? And it's not just economic. It's not just an economic calculation. It's also a health calculation. I was in a school of public health. I'm moving to a cancer center. I care a lot about health.
This is a plot. It doesn't look like real data, but it is real data. This is a plot of income percentile on the X axis, and on the Y axis is life expectancy. You've got one line for female, one line for male. And you can see that there's like near-perfect correlation between how much money your parents made and your life expectancy on average in the population, which is kind of incredible. Education is still the best treatment we have to cure the problem of lack of opportunity.
Education is still the best treatment we have to cure the problem of lack of opportunity.
This is a plot that shows how much your parents made on the X axis, and on the Y axis is how much you make after going to different types of schools. Each color represents a different type of school. And the lines are relatively flat, which means if you get into one of those schools, you have a very good chance of moving up economically. But the challenge is you can't always get in.
So this is a similar plot. This is colleges. Each dot is a college. On the X axis is how likely you are to get into that college on a scale from 0 to 100 based on your parents' income. And on the Y axis is how likely you are to be in the top quartile of income after completing college there. So places like Johns Hopkins, where I used to be at, have very low access. If your parents didn't have a lot of money, it's unlikely you're going to get into that college. But if you go there, you're going to have a very good chance of having a high economic return.
And so you can actually calculate this for every college. You can say multiply the probability you get in if you're from the lowest quintile of income times the probability you'll be in the highest income once you get out. And that can be what we call a mobility rate. This is a calculation developed by some economists. And so the best generators of mobility in the United States are typically large public institutions, community colleges, technical colleges, that produce the highest level of economic mobility in our country. And they're driven by access.
Data science as a path to opportunity
And so we're all here because we love data science. We love R. We love Python. We love all the other languages. And it's a high growth and wage job category. And so how do we get more people to have access to this? As Hadley mentioned, we've been teaching massive online open courses about data science for a long time. And they've now enrolled more than 10 million people across the world. And these programs can lead to big income gains.
So here I'm showing a plot. On the X axis is the cost of the program. And I've got graduate school, undergraduate, associate's degree. And on the Y axis is the increase in income post program. And for one of our data science sequences, you get nearly the same amount of income boost as you would for a college or advanced degree, even for a much lower cost. So that seems pretty exciting. And originally we were thinking, okay, this is how we generate this mobility at scale.
But as not just us, but many, many people have noted, Coursera classes, boot camps, those sorts of things, they tend to benefit the already well educated. Most of the people taking these programs are actually already have bachelor's degrees, master's degrees, professional degrees, and are career transitioning. So we're not really generating access for the people that the mobility rate applies to. And so what we try to do is, can we create opportunities for the folks that aren't taking the massive online open courses, who haven't been introduced to R via these scalable programs?
And lest I say that we're the only people thinking about this, there are a ton of amazing, I've been to talks today from open scapes to, you know, met amazing people from various parts of the world. We got R ladies, we got data carpentry, the bioconductor team, so many people that are also trying to do this, to try to lift people up through the power of data science education and data science opportunities. But the reality is there are significant barriers to entry. I showed this graph a minute ago for myself, but it applies to anyone who wants to get into data science. You have to know about data science, you have to have the income security to participate, you have to have access to a computer, programs, instruction, connections, the right jobs, it's a lot. So how do we take these barriers down one by one?
The Datatrail program: removing barriers one by one
So first you have to know about data science. And so the way that we've addressed that problem with our program is we actually go out into the community and we partner with community-based education associations like Historic East Baltimore Community Action Coalition and Heart Smiles, who train young people to get their high school diplomas, who train them to get technical training, and we teach them about the opportunities that are there in data science. We partner with them to recruit students for our program.
Then how do we address the computer? So this is the thing that maybe kicked off all of this for me. I love Chromebooks, and you can get a Chromebook for $450. We've seen the power of this with education over the last couple of years with the pandemic. And it turns out that you can be a data scientist on a Chromebook. And this is where, again, RStudio and Posit have been a major player in what we've been able to do by creating RStudio Cloud. We were one of the first programs to jump onto that platform so that you can do all of your data science work directly on the cloud. It's such an incredible tool, and I'm so grateful that we've been able to use that to help train people. And so you can be a data scientist on a Chromebook. You can use Google Slides and Google Sheets, and you can use RStudio, and you can teach people how to do it. And they don't need a MacBook Pro to be a data scientist.
There's also barriers to entry. You know, if you need to buy a MacBook Pro, it costs you $3,000. If your family only made $18,000 last year, you can't participate in a program like this. So with funding from Johns Hopkins originally, later from the Abel Foundation, we actually pay the students to complete the courses. So as they finish our massive online open courses, they earn money so that they can participate in it like a part-time job.
They also need access to the right kind of instruction and the appropriate programs. So here's where R really shines. Why would we use R? I get asked this question all the time. We built our entire training program around R. I know that there's also Python people here in we love Python too, but we built our whole program around R, and I get asked a lot why we did that. And the reason why is we wanted to minimize the time to magic.
So with just a few lines of code, R Markdown with just a few lines of code can put a website up on the web that's interactive that you can show off to your friends and to your family. With things like the big R query package and dbplyr, you can connect to massive databases and analyze huge collections of data with just a few lines of code. And with things like Shiny, Flex Dashboard, Shiny Dashboards, those, you can also write just a few lines of code here. It's like 34 lines of code that creates an interactive NHANES website. So with minimizing time to magic, we can train people very quickly to do incredible things.
And then we can build a program around it. We can build a community around it that helps support people as they go through and they learn how to do data science. We do this on Basecamp. You can also do it on Slack or any other community building platform. But there has to be both an online and an in-person component.
So then the last part is you have to think about, okay, say we have a training program. So we've trained them to do it. We've built it in R. Then they need access to connections. They need to be able to actually get a job. And so here we can, again, leverage the R community, the people here in the audience today, to help make connections between the young people that we're training and the broader community and opportunities that are out there.
And this ecosystem really works. We've had 80% success rate in our Johns Hopkins pilot that I'm going to tell you a little bit about and completing the training. Just to give you some idea, massive online open courses in general, the completion rate hovers somewhere between 2% and 5%. So 80% completion rate for this massive online open sequence is an incredible testament to the will and intelligence of the people that we're training in the community.
Getting graduates into jobs
So after they finish the training program, though, we want to put them into opportunities that can help change their lives. And so the reality is there are significant barriers to entry to these jobs, but we've partnered with programs that support internships that represent a soft landing place. We teach people data science in 16 weeks. That is not a lot of time to learn data science and learn how to be a data scientist. So we partner with programs like Urban Alliance, and we have a couple of people here, Chaz and LaShonda from Urban Alliance. And they're helping us build internship programs that support the long component of training that's developing the soft skills, developing the approach to work that will let them be in these careers long term.
So I'm going to tell you a little bit now about our pilot that we ran in Baltimore. This is what the whole pilot has felt like. We started this off with just an idea and absolutely no program. And so at every turn, it's felt like we're barely getting the train tracks together in time to be able to continue to run the program.
So this is how it started. In February 2018, we started developing content for our data science training program. We met with HBCAC, Historic East Baltimore Community Action Coalition, in April of 2018, and we told them we were developing a program and we were excited about working with them. They got very excited and they said, can we start tomorrow? Remember, we had started developing our program in February. It was April. And so they said, okay, we'll give you a little time. We'll start in May.
We had to develop this much content between February and May. So we're talking 12 courses that cover everything from how to use computers, to how do you organize projects, to the tidyverse, to written and oral communication, to how to get a job. 12 courses in five months. So this was me. I was a little stressed. I was worried that we wouldn't be able to do it.
But fortunately, we had some really incredible help. So at the time, there are two people I particularly want to call out, Shannon Ellis and Abuzar Hadivand, who are postdocs at my group. One was an economist working on studying how the outcomes of these massive online open courses. And one was a geneticist. But they pitched in and they said, we can build this whole program in that much time. So they assembled this collaborative team. And on GitHub, using R Markdown, we built this entire program in that much time.
We did have some challenges along the way, though. As we were building the program, the courses, like getting and cleaning data, were being updated because new packages would come out. So this is a plot on the x-axis is time. Thousands of R scripts after the start of a program. And on the y-axis is the usage of different packages. There's one that kind of stands out. And that's because right in the middle of when we're developing these courses, dplyr comes out, rendering half of our program totally valueless. And we have to start from scratch. Thanks, Hadley.
So what we had to do is regenerate all of our materials. And we now pay very careful attention to what the RStudio and Posit folks are doing on their GitHub repos as we're developing courses.
We built this program. And so now we were ready to start training people. So we thought, well, let's start May 1st. How long does the program last? They asked me. We'd never run it before. In fact, we had just built the program. I don't know how long the program lasts. Let's call it 12 weeks. So I told them we were going to do it in 12 weeks. Shannon, Abuzar, and I said, OK, we're going to finish it by August 31st. Despite us still having to fix up courses, all the bugs, our two pilot candidates finished in about 14 weeks. 16 weeks, sorry.
So that was pretty incredible. We took two young people from HBCAC. They'd never heard of data science before. We got them through an entire 12-course MOOC sequence. They both finished. It was awesome.
Now they'd finished the data science training program. And we realized we got to get them a job. And so there's a bit of a challenge that we ran into that you may have run into sometimes in your search for jobs. This is the roles and responsibilities you may experience that you need to do once you get on the job. And this might be the job experience they're asking for.
We saw that a lot. We saw that there were jobs that we knew that our graduates could perform but were not necessarily advertised in the right way. Moreover, we ran into the Fair Labor and Standards Act. And I learned more about the legal requirements around employment than I would care to repeat. But the basic idea is that because they're not computer scientists, they actually write text and they write prose and things like that, they don't qualify to be exempt employees unless they have a bachelor's degree. So we need them to be hourly data scientist jobs, which it turns out a few companies have, but many don't.
So we're this close. We train these young people. They're brilliant. They finish the program. But we can't quite get them into the finish line. So what are we going to do about this? How are we going to actually get them to the point where they can be employed and they can move to a different place in their life?
So I did what anyone would do. I called a friend of mine, an old family friend who was a consultant for IBM. And I said, let's start a consulting company. And so he and I started a consulting company. And we would do data science consulting work for companies all around the country. We would do them for Fortune 500 companies, for small companies, small mom and pop shops. And all of the data science work that was being done was being done by graduates of our program that we were hiring from the training program. This worked great. Graduates did an amazing job. We were able to do it. But we were a bootstrapped, bespoke data science consulting company. And ultimately, that's not the way that this is going to work going forward.
Building sustainable infrastructure
So the next part of our story is we brought on Ashley Johnson, who's here today as a core member of our team and is the program manager for Datatrail. And she managed, through a lot of hard work, blood, sweat, and tears, to get our HR systems at Johns Hopkins, which is a massive organization where HR does not move quickly, to create a new job code so that we could hire people at Johns Hopkins specifically with the credentials from our program. This is an incredible testament to Ashley's patience and brilliance getting this through the system. And it allowed us to hire Davon Person, who's a graduate of our program, to be the main TA for all of our future training cohorts.
This kind of work has to be done by advocates within an organization. This isn't data science work. This is the hard blocking and tackling that needs to be done by each and every one of us if we want to build new opportunities where there weren't opportunities before. Build new bridges.
We have this phrase that we always say when we're building this program. Hard things are hard to do. When you want to make a big change in the way the world works, it's not going to be easy. It's going to be hard. And that means it's going to be hard to do, but it's worth doing even though it is hard.
So I think I'm going to take another brief pause, and I'm going to have all of you think about some time you accomplished something. Something that was really hard, that you're really proud that you did, even though it took a lot of work and it was kind of painful. Tell your neighbor about it. We'll give you another couple of minutes, and then we'll come back.
All right. We'll jump back into it. All right. I'm stoked about this. Everybody was very excited to tell each other about their accomplishments. That's great. I'm excited about it. Hopefully, I'll get to hear some of them later from some of you.
So again, hard things are hard, but they're worth doing, as you all just were telling each other about something that you were really proud of, but it took a lot of work to do. So it's exciting to be a part of those things. It's exciting and compelling to be part of something that feels like it's really important, even if it takes a lot of work.
Impact of the program
I'm going to tell you a little bit about the impact of this program and what it's meant to the people in the community and to us. So first, I wanted to mention Devon, who's here today. Devon is our curriculum lead for Datatrail. He's a graduate of the program, an incredible guy. I knew Devon was going to be the right guy for our program when we had a dinner at our house for all the Datatrail trainees. He came to my house, and he spent the whole dinner teaching my, at the time, eight-year-old son how to play chess. Unfortunately, that's bad for me, because now I get beat all the time at chess. But it was a great testament to the incredible talent that he has for helping people learn how to do things.
Another incredible part of this program is that one of our graduates, while working at Problem Forward, worked on an HBO documentary called The Slow Hustle, if any of you have seen it. If not, you should definitely go out and see it. They did a bunch of analysis of the historic redlining in Baltimore's community that prevented people from buying houses in different communities within Baltimore. It's quite a famous problem around the world, but in Baltimore, it's particularly transparent how big a problem it was. And so it was credited, actually, in the movie credits for their analysis of the data that showed this redlining. It was really an unbelievable effort that was made by a Datatrail graduate.
And so this is an example of a quote. We collect quotes from all the people that come from the program. And so it says, it gave me a new direction and taught me you can be different, go a different route than your parents, not always hanging on to what society tells you. You can follow your own protocol. So it just gave me a sense like a new world or a new field and a new life that I got to see.
It's such an incredible thing to be a mentor. It's such an incredible thing to be a person that makes a connection to a new field, to a new opportunity. And it's been incredibly rewarding to see that happen, not just for us, but watch it scale as Davon makes connections for people, as Candace makes connections for people, as Ashley makes connections for people.
Lessons learned
But we did learn some lessons along the way. I mentioned that we were a train that was constantly trying to take things off the track. This was almost all due to the fact that I'm a biostatistician by training. I don't know the first thing about building a community organization and had to learn over and over and over again things that people who work in the field have known for a long time. We learn those by listening, by paying attention, by spending time in those communities, but they're important and hard lessons.
One of the most important things we learned is that I'm here at a data science conference, at our conference, but data science is actually the easy part of all of this. The young people that go through our program are bright. They're good at computers. Pretty soon they're teaching us about data science. That's not the hard part. The hard part is everything around it. All the stuff that comes up in somebody's life, you know, housing falling through, not being able to get transportation, not having the access to the resources, not having access to the connections of people that can hook you up with a job or an opportunity. Those are the things that really take work to make something like this happen.
Data science is actually the easy part of all of this. The hard part is everything around it.
The other thing that we learned is that building trust takes time. You know, we're working in a community, we're working, you know, we come from Johns Hopkins at the time, now from the Hutch, you know, and we're making these connections to community-based organizations who've been working in those communities for a long time. But we're new, you know, we're not part of that community yet. And it takes time and it takes effort and it takes being there all the time consistently to be able to build that trust so that you can actually build a relationship that will last and will create people that move on to the next step.
You know, it's hard to tell somebody you just met what's really going on. One of the things that's always the biggest challenge in our program is getting people to ask questions. It's intimidating to ask questions when you're starting something new. It's a scary thing to admit you don't know something. And so it's an important part of the program to build trust, to build those connections so that people can comfortably ask you what's wrong, tell you what's wrong, ask you for help. It's one of the things I appreciate to an incredible degree about the community here is that we're open to questions, that when people email you or tweet at you or send you a GitHub request, you answer those questions politely and kindly and with compassion. It means the world to people who are just starting out.
And we also need to be able to scale employment and internships. Our original idea was basically born out of panic. I hadn't really thought through how we're going to get people jobs at the end of this training program we started. That's my fault. We needed something a little better than just starting a random consulting company. And so we need to be able to scale those employment opportunities and those internship opportunities. And this is where partnerships with organizations that know what they're doing is critical. So we've been working with the Urban Alliance who has a whole program and a whole system for managing internships for young people who come from communities like the one we're trying to serve. And so we're partnering with them to be able to run these internships in a long-term and sustainable way.
We also need advocates. So one of the things that we've realized as we're going through this program is the thing that Problem Forward did, it wasn't that it was a data science consulting company. It wasn't that it was related to the training program. It was that we could advocate on behalf of young people who come out of these communities, who are talented individuals, who have credentials that maybe look a little different, who maybe don't have all of the background that other people might have when they're first starting out, but that are really talented and could be spectacular if just given a chance. And so within all organizations, we need those advocates. And those advocates have to be people like you, people like me, people who have opportunity already and can reach out and try to support somebody else as they go through that same journey that we all went through, that I had to go through when I came from Idaho to get to here.
And the other thing we need is mutually intensive learning. It's been an incredible experience, I think, for both sides, both people that are coming out of the training program and the mentors and teachers and hiring partners to learn about each other, to learn about the things we do well. You know, I don't know the first thing about Instagram and TikTok and all that kind of stuff, but I'm learning. I'm learning. That's how we're getting into new communities. And I'm learning that from the people that are in the training program. So I think that there's a way that we can learn from each other. It's not just teaching data science in one direction. It's also learning in the other direction as well. And that mutually intensive learning can really develop a relationship that lasts a long time.
The future of Datatrail
So our Datatrail model has three big components now. There's colleges and universities or training partners in general. They provide tutors and educational support. Then there's hiring partners or mentors. They provide the mentors and opportunities for the people once they finish the training program. And then we have nonprofit and community-based partners who do the hard work of supporting the whole person. Because when you're doing a training program and you're coming from a community that doesn't have a lot of resources, small things can cause big problems if you don't have the support you need to get through the training program. And so the three groups working together can really provide that infrastructure, that underlying infrastructure that gives somebody the opportunity to come from a community without a lot of opportunity into a place where they can be a leader in the field.
So this is how our program works now. From January to June, we're recruiting young people. We're getting into schools. We're providing content online. We're trying to get people excited about data science. And then from June to September, we're doing our training program. We're actually running the data trail courses. They're going through, they're getting paid as they complete the courses, and they're finishing the program. And we're also recruiting mentors. We're recruiting the people that are ultimately going to hire and mentor and manage the young people in their first opportunity. Then from September to June, they do an internship. This is supported by Urban Alliance. And this is a 10-hour-a-week internship where they get paid. It's virtual, and they get mentorship a couple of hours a week from a mentor at a hiring company or a partner.
This is how the program works now. And so there's an opportunity for you to get involved with us at DataTrail if you're interested. There's two different ways. One is the more heavy-duty way, which is start your own data trail in your own community. And the other way is to hire or mentor data trail graduates from our program.
So I'm going to start with starting your own data trail program. We talked a little bit about doing hard things. And we did a hard thing to set this thing up, and we've paved a lot of the ground for you. So if you're interested in starting your own data trail community, you can reach out to Ashley Johnson, to myself, and we can provide you with a lot of the background and materials you need to build this program in your community. We can provide startup documents, relationship agreements. We can give you information on how to get funded. We provide free courses so you can access them and train people in data science. And then we provide access to the data trail community so you can talk to the people who are building it in Baltimore and around the country.
I'm very excited because the first data trail franchise has started already. So Madhu and Melanie contacted us. They're at Mount Sinai Health System in New York City, and they were really excited about data trail. And the two of them set up a data trail franchise that's running completely independently in New York and is now on its second cohort of graduates, placing people in the medical health system, placing them at jobs around New York City. It's amazing to see how just a couple of committed people can have such an incredible impact on the lives of young people in their community, and I'm so impressed with the work that Madhu and Melanie are doing.
So the other way you can get involved is to hire a mentor. So we're working with Urban Alliance, as I mentioned, and there's two ways that you can do this. You could hire data trail graduates into your company if you have internships that you want to support. It's part-time internships. They're remote, paid internships, and they can be supported by Urban Alliance and data trail. In other words, we support them on the technical side, and Urban Alliance supports them on the soft skills side. I'm incredibly excited to announce that Posit has committed to host two of these interns this fall from our data trail cohort, which is an amazing opportunity, and thank you so much.
The other cool opportunity, and this I'm also wildly excited about, is that you can mentor data trail graduates as an open source developer. So we could place people directly into companies, but there's also an incredible opportunity for our data trail grads to make an impact on the open source and open science community. And so the idea here is we get philanthropic and corporate funding to pay for the internship, and then we will recruit mentors from the open source community to work on their projects. And so Posit, again, has committed incredibly to fund one of these this fall. If you're an open source developer that's looking for somebody to work with you on your project and you're looking for a talented young person, you should come and talk to us after the event. And then the Abel Foundation has also agreed to fund one more this fall from our training program. And that represents our entire training cohort is going to get funded this year, and we're excited about building this over time.
Paying it forward
So if you have an idea about a program that if you want to either mentor a student this fall or you want to start thinking about next fall, get in touch. I've got the email addresses up here of Chaz Ackley, who's our Urban Alliance partner, the director of the Baltimore office, and Ashley Johnson, who's the director of the data trail program in Baltimore. And you can reach out to either of them or to me if you're excited about mentoring a data trail grad.
I'd like to finish, though, talking about not just data trail, because this is a cool opportunity and we would love to work with all of you, but you can also do it in any other way that you feel like. I think just bringing it back to what I said at the beginning of the talk, this is an incredible opportunity to be together, to build community together, and I think that what Posit and our studio have done is just an amazing way to make us all feel part of the same community.
One of the things that we should do is help each other, whether that's through data trail or through Openscapes or Our Ladies or Data Carpentry or any other program that you're excited about. And there's this quote that I love. I emailed my undergrad advisor when I first got my very first faculty job and I told him I was so excited about it. I got to live near my wife. I got to live in Baltimore. I got to do my dream job. And he wrote me back, but he was really glad to hear. And he said, mentorship is a debt you can't pay off. I had some mentors who helped me when I was young and I can never pay it off to them. I can only pay it forward to other people. And so he said, thank you for helping me pay off my debt.
Mentorship is a debt you can't pay off. I can only pay it forward to other people.
And I feel like we're all in this room owe a debt of gratitude to all of the people that have helped us be here today. And whichever way you choose to express that, I hope you'll think about ways that you can pay forward your debt.
There's almost nothing more rewarding than this. It's been an incredible journey. And I think that the people here today, the Datatrail folks, the Urban Alliance folks, the HeartSmiles folks, the Hebcac folks, they're all here to say that this is one of the most rewarding things you can do is create opportunities for young people. With that, I'll thank you. And at the conclusion of the talk, after this is all over, I'm going to be hanging out up here in front with Ashley, with Chas, with Davon, with Candice. If you're excited about mentoring or talking about Datatrail, please come and hang out and talk and we'd love to get to talk to you. Thank you very much for your attention.
Now we engage in light banter for one to two minutes until people ask questions and have a chance to upvote them. Light banter about ggplot2? Yeah, sure. I'm glad I got my digs in before the talk. Yeah, that was a good investment. I feel like that's the only reason I got the invitation was so Hadley got to say that.
I was actually reading through, like, there's been a few. You also wrote one on tidy data that wasn't really, like, against tidy data. I was not against tidy data. No, that was, like, very much, like, these are some use cases where it's, like, not great for, and yeah, it was right. Oh, man. I'm getting all the grapes. Somebody save me with the question, please.
Are the mentoring opportunities only available in Baltimore? No, these are remote online mentorship opportunities. So if you are an open source developer and are excited about mentoring a young person, it's going to get paid for by the Able Foundation. It's going to get paid for by Posit. So you just get an awesome mentee to work with you for a year. So if you're an open source developer anywhere, please reach out.
And the next question, how have the program graduates been doing long-term? Yeah, they're all over the place. They're doing everything from data science jobs, but the kind of cool thing about it is, you know, people like Devon is working with us. We've got other young people that are working in data science jobs. One has converted to Python, which is great, but, you know, I won't say that too loudly. But then there's also some people that just took the training and then went in a different direction in what they cared about. I think one of the cool things about the program is we're catching young people at a really formative time of their life, so 17 to 23 or so. And I'm sure you all remember, you didn't exactly know what you wanted to do when you were 17. And so data science could be cool for a while and they learn some stuff from us and then they maybe go apply it, you know, the skills they learned somewhere else. And so we have some people that end up in data science careers, some people that go back to school, some people that end up, like, doing something totally different but kind of leveraging the training. But we keep in touch with all of them. And so, you know, I'm always texting with people that have graduated from the program and it's cool to see that community build over time.
And what are some of the challenges you faced in terms of both kind of, like, buy-in and just, like, raising awareness of the program? Yeah. So I think that I tried to highlight a couple of the key issues, but, like, there's been a whole collection of issues in doing this. Everything from, you know, HR problems to legal problems to bureaucratic problems to, like, building data science program problems. And I think the core message, though, is that the thing that helps most is to think about the target at the end. So our goal is to help young people get opportunities. Sometimes that's by building a data science program, but sometimes it's by Ashley and I going to a million HR meetings to try to create a new job code. And so I think thinking about that target, like, what you're trying to create at the end gives you a lot of leeway to do all the things you need to do and then being flexible about learning new things. I think that's been the hard thing for me, definitely. I didn't know the first thing about, you know, I'm a biostatistician. What do I know about any of that? But I'm learning over time, and I feel like having smart people I work with helps.
What about plans as you transition to the Hutch? Yeah, so we're getting ready to be able to run the program in Seattle, so I'm excited about that. And then it's running in New York. We have a couple of other ambitions about starting in a couple of other cities I can't announce quite yet, but that we're going to start soon. And so really now we're at the stage where we're excited about scaling it out. And so it's been running in Baltimore for a few years. It's successful in Baltimore, and I think it's the kind of program that can very easily run in a number of cities around the U.S. that I highlighted. And so we're really in this phase of trying to get connected to the right people who are excited about running those programs. Awesome. Thanks, Jeff. Really, that was, like, just amazing, uplifting keynote. So thank you so much.