Benjamin Braun | Publishing Customer Facing Products with RStudio Connect | RStudio (2022)
Benjy Braun, Chief Architect for 202 Group, shows why he and the 202 Group team decided to use RStudio Connect to build customer facing applications and secure websites while focusing on data science and not having to worry about hiring for—or learning—a bunch of web tools like JavaScript and php. With well organized git repos and Rmarkdown, you can build websites for customers with embedded shiny apps and dynamic visualizations and publish and deploy them using RStudio Connect. Using this method, you won’t confine your analysis to static documents like PowerPoint and pdfs and your customers will always see the latest and most up do date information. The tools are out there to do this and it’s easy to start and iterate with more features overtime. Session: Data science in production
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, my name is Benji Braun, and I'm really excited to talk to you today about getting data science in your customers' hands quickly and on a budget using RStudio Connect. This is an unconventional use of this product, which is really geared towards internal apps and resources. But I want to show you how you can use this product for customer-facing applications as well as internal ones.
But before I get into that, I want to set the scene for how we started using RStudio Connect to publish customer-facing data science applications.
The summer of 2020
It was back in the summer of 2020. We all remember, but many would like to forget that time. In addition to being in the middle of a lockdown and without childcare, my family and I had moved in with my in-laws to get some extra support. Moreover, my wife had gone full-time on her own new business back in January 2020, while I had joined a startup as the first technical employee in February 2020. Needless to say, it was a stressful time.
I have a feeling this is a picture many of you with kids have, or at least can relate to. If you don't have kids, just imagine everything you were doing back in summer 2020, but add a two-year-old and a four-year-old to the mix.
The problem: poor UI and stale data
As all this was going on, my company, 202 Group, was dealing with a problem. We had built a good, streamlined data pipeline. We had sourced the necessary data we needed to make actionable insights for our customers and built the methods to ingest and store that data. We had effective ways to analyze and model the data and customize visualizations for end customers. We were using open data and data science to expose risk in critical national security supply chains, and our customers were clamoring for more so they could take action.
But despite this tested process, we had a problem. We had a really poor UI UX, if you could even call it that. We were delivering everything in PDFs, PowerPoint, and Excel CSV files. This is a big problem for two reasons. The first is that once PowerPoints or PDFs or spreadsheets left our inbox, it was immediately out of date. The customer could still get value out of it, but they relied on us emailing them updates, which were out of date almost immediately. They needed an on-demand, up-to-date data.
The second problem was that it wasn't an appealing way for the customer to consume the analysis. At first, you might not think that's such a big problem, but if your customer doesn't like the way they interact with your product, they won't use it. So if you actually want your analysis to be anything more than a science project, you need to make sure your customer likes the way they interact with it. And for us, as a small bootstrap startup, our customer liking the product meant still being in business by the end of the year.
So if you actually want your analysis to be anything more than a science project, you need to make sure your customer likes the way they interact with it.
Requirements for a solution
The customer needed to be able to explore on their own in a user-friendly way and get the data into formats that they were comfortable with. It needed to be dynamic to fix the stale data problem and the static interface problem. But we had other requirements as well. We needed to get this out the door quickly in the next few weeks, not months, and we didn't have very much money to do it.
Last, but certainly not least, we needed something where we could continue to focus on the data science portion of our company. The thing that was differentiating us, the value we were actually delivering to customers, was the data science finding, not the interface itself. Learning web development technologies like CSS, HTML, JavaScript, PHP, would not only be a distraction, it would actively detract from our time developing the data science, which is where we were really driving value.
So based on all this, we developed a set of five criteria we would use to evaluate the proposed solutions to this problem of getting up-to-date data science products in our customers' hands. It had to be cheap, like a few thousand dollars tops. It had to be easy to use. We needed to be able to focus on the data science and not spend a lot of time figuring out web technologies. It had to be easy to customize. Our customers were still figuring out what they wanted, and we needed to be able to adapt and make changes quickly and easily. The user had to log into an interface. The customer needed to be able to access up-to-date info when they needed it. Emailing stack reports had to add. And last, we had to retain our IP. Getting locked in with a vendor didn't seem like a good idea, and we wanted to maintain flexibility in the future if we started doing something new.
Evaluating the options
So we settled on three options. We considered third-party BI tools like Tableau or Power BI, hiring a front-end developer, and last, RStudio Connect.
Some of my teammates were pushing for this option. I was against it for a few reasons. The startup costs are actually pretty expensive, which is a big driver for us at this point. But there are two other, perhaps more important reasons. One is that learning to use one of these tools was really a poor use of our time. We already knew how to use rmarkdown and Shiny. Why learn how to use one of these tools when we could be developing data science? If we were going to use these tools and learn how to use them from scratch, we might as well learn JavaScript ourselves. These tools might market themselves as easy to use, but in practice, I don't think it's really that easy, especially if you want to produce something appealing.
The other thing is that we'd get locked into the vendor, and switching costs would be high. Anything we developed would be specific to the vendor's product, and we'd lose flexibility if we wanted to do something else. So third-party BI tools had two out of three. I was ready to move on.
What about hiring for this need? The problem here was that it wasn't going to be cheap to get someone good, and while it might be easier than doing it ourselves, it would be hard to source someone good and We also wanted to stay focused on our value, which was in the data science, and hiring outside of that would also have been a distraction. If we were going to hire, we'd want to hire another data scientist or engineer. So better than the third-party tools, but still only three out of five.
Choosing RStudio Connect
So this left us with RStudio Connect. I was only familiar with this product because I had goofed around enough in the RStudio IDE and clicked on this button here, the publish button. This was really helpful when I published a blog using RPUBs. So the idea of a button in the RStudio IDE to publish rmarkdown and share them securely was really appealing. Then when I got in touch with RStudio about using Connect, I found out about the small business discount. This made this option far and away cheaper than the other options we considered. And since we got to keep using rmarkdown, we didn't have vendor lock-in, and we retained all our own IP.
So checks across the board. It checked all the boxes, which is why we used it going in. But the reason why we continued to use it, and I'm such an evangelist for it now, is what made it so simple. We were able to change, update, and push content quickly. We could respond to customers both in terms of access and data within minutes. And as time went on, we were iteratively able to learn more web and front-end technologies and incorporate them into the end product. This included custom CSS and JavaScript to give our products a unique look and feel and better user experience with only minimal front-end needs.
We could respond to customers both in terms of access and data within minutes.
So we've gotten to the point based on customer needs and where we are as a company that we now do deliver our product in a custom JavaScript PHP front-end with a dedicated front-end team. And now we use the RStudio Connect for its more conventional purpose of hosting internal products. But what's awesome is that everything we developed was in rmarkdown. So those rendered HTML docs and graphics could be used by our front-end developers.
So I want you to think about RStory and how it might work for you. If you're a small business or even individual, and you're not sure how to get your data science into your customers' hands, think of RStudio Connect. It's cheap, easy to use, easy to update your products, lets your customers log into an interface, and lets you retain all your IP without vendor lock-in.
Please feel free to reach out to talk about this or anything else. Thank you so much for your time and coming to hear my talk.