Resources

Ivonne Carrillo Dominguez | R Shiny - From Conception to the Cloud | Posit (2022)

I will share how we published an R Shiny application to AWS, the decisions we made, and what we learned in the process. One challenge we faced was figuring out how we could develop collaboratively. We needed to define our development workflow, including version control, dependency management, and quality assurance. Then, we needed to define the deployment method. RStudio is great for development, but it may hide many of the aspects that break the application. We used CI/CD workflows as much as possible to ensure our code was robust before pushing the changes to production. Lastly, our infrastructure team designed a replicable framework, so we are ready to deploy new R Shiny applications quickly and focus on data analysis. Session: Data science in production

Oct 24, 2022
14 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Have you ever felt that when you are in your own house, cooking for yourself, that you feel too comfortable? So comfortable that you might end up with a sad sandwich? Well, this happens to me too. But, when my husband joins the party, it's totally taco night.

Of course, when you have more help in the kitchen, the result is much better. Sadly, obviously, this is not always the case, because working as a team can be challenging too. Just imagine a group of friends trying to cook together, they are used to different things, they are used to use different tools, and if they are not all on the same page, it could be a disaster.

Well, this often happens when you put data professionals and infrastructure engineers to work together. But if they speak the same language, they are said to build successful products.

In this talk, I'm going to tell you or discuss with you some of the best practices that you could follow when building Shiny applications. And some of the challenges that you may face when working with interdisciplinary teams when you want to deploy Shiny applications to the cloud.

I am Yvonne Carrillo-Dominguez, data engineer manager at Vixel. At Vixel, we have several teams, and I've had the opportunity to work in two of them. The data science analytics and cloud computing. And these two teams often collaborate together when building data products. And the story that I'm going to tell you is about this team's very first Shiny application.

I've broken down this story in three parts. Product conception, development time, and going to the cloud. In each of those parts, I'm going to tell you some of the challenges that we face and how we overcome them. And spoiler alert, we should have gone to the cloud from the beginning.

Product conception

Let's start with the product conception. The first question that we ask is how we are going to collaborate, given that we came from the data analysis that we were doing individually. Well, now we needed to code together. You may know the answer, so I'm just going to say it. We just Git.

Some of the team members were learning about Git, like how to push and pull code. Well, luckily, we had this plugin in RStudio that is the learning curve. Because it can be tricky sometimes. And why use Git? Well, you want to be able to share your code, integrate everyone's contribution, and keep different versions of your code. This is on your local, but also in the Git platform. And you can use any Git platform, like GitLab, or GitHub, or Bitbucket.

In our case, we asked the infrastructure team which was the better tool, because what we wanted to do is just focus on the development and not worry too much about the hosting. We wanted some pipeline, some automation where we were able to push our code, and then it was going to be deployed to the cloud. So they chose GitLab because of the continuous integration and continuous development features. So in this way, they were able to use the tool that they felt more comfortable with while we were using the tool that we were comfortable with in RStudio.

Then we started using Golem. Golem is a Shiny framework to build production-level, robust Shiny applications. At the beginning, I wasn't very sold on using this tool because it will add a lot of R packages and scaffolding code that I wasn't very sure that we were needing it. But then I really liked it.

It sets you a very basic web application where you can build on the top of it. And one of the things that it adds, for example, is kind of like these scripts that it will guide you step by step so you can start building your Shiny app. For example, it will help you configure your application name, your unit testing, your favicon, and then it will take you to the next step. So it is a very nice tool, and I highly recommend using Golem or a framework like this one, especially if this is your first Shiny application.

Dependency management

This is a concept that if you're not very used to it, it can be hard to understand it. But it is very useful. So we were facing the issue that maybe my code was working on my local, but it wasn't working on another team member environment, and that's because they didn't have the R package that was needed. Or maybe they have it, but it was an older version that wasn't supporting the function that I had. So for that, you want to use a tool like a dependency management tool. And there are several options.

You might as well just use an R script that will call your list of dependencies, or there are more sophisticated options like background or RMP. Whatever the option you choose, try to do it from the beginning so you won't have to do some reverse engineering. Remember all the tools that you have installed and then add it to the dependency management tool. If you do it from the beginning, you won't have that problem.

Also, it is okay to explore and innovate or there are packages that are out there, but for your application, just add the tools that you really need for your application to run. So when it is on the cloud, you are going to use packages, just the dependencies that you need. So just keep it simple. Every package that you add is used, potential bugs or issues are more maintained. So it's a good idea to keep it simple.

Development time

At this point, we were ready for the development time. So everything is already set up. So we were ready to build our product.

So how were we going to distribute the workload? Well, Golem helped you to create these Shiny modules. You can think of them as mini Shiny applications that you are going to call from the main script. In our case, we had five dashboards, so we created one Shiny module per dashboard and we distributed them among the team members. The way that you are going to separate the work is going to depend on your use case, your application or your team members.

Integrating the backend. So this is how we did it. We collected the data and created some R data frames. Then we loaded them into an SQLite database, which is a file. And then set up the database connection. We create all the queries that we needed and then connect them, connect those results to our dashboard charts.

But that wasn't real data. It was just like sample data. So we needed to really be collecting data on a daily basis. We needed to build some data pipelines and then loading that data into an Amazon Redshift and then do it all over again. So what is the recommendation here? Well, connect to your database from the beginning. Even if that database does not contain all the production code, it can be sample data. But it is very important that you start building your product thinking that it's going to be in the cloud and that it's going to be in production.

Keeping credentials secure

Keep it secure. So at the beginning, we were... This is very bad, I know. But we were hard-coding passwords, AWS keys, and sensitive information. So what happened? Every time that we were pushing database credentials, we needed to go to that database that was already compromised and change the credentials.

But what if we had pushed some sensitive information that we cannot really change? Well, you need to go to Git and remove that from your code. But Git, one of the main features is that you are going to keep several versions of your code. So even, let's say that we are talking about the password, even if the latest version does not contain that password, you can just go back to the previous commit and the password is going to be there. So you have to go all the way back to the Git history where that password is present and remove all those commits from the Git history. And by doing that, you are going to remove some production code. So it's going to be a mess.

So how to solve that? Well, use environment variables. You can use this .r environment file that's going to be kept in your local. That's going to have these key value pairs. And you are going to call the keys in your code like so. And not only is it going to be secure, but it's going to be ready for the cloud as well. Because what if you have a database per environment? Those credentials are going to be different. It doesn't matter because your code is already prepared. It's only using the keys. The values are different in every environment. So the only thing that you need to tell your infrastructure engineers are what are the environment variables that you need for your application to run. And they are going to set up with something similar to that.

Going to the cloud

Now, going to the cloud. This is the moment of truth. Because what you want at this point is just push your code and hope it runs. You want to build, release, and run. Well, this is rare that it happens the first time. And for us, it didn't happen like the first ten times.

So how we solved that? What we did is that every time we were doing a commit, we were validating that commit. GitLab has these nice pipelines where you can add some steps to validate the code that is being pushed. So, for example, what you can do is the first step, install the dependencies, package your application. And I think this was the selling point for me about Golem. Because I wasn't really sure about using it. And when I saw that Golem needs to package your application and then run it, it was weird to me. Because I was thinking like a package like dplar or ggplot that those are there for you to use it. And I didn't want my application to be out there. But don't think it like that.

When you package your application, what will happen is that it will run all your code. So that means that if you forgot to add a dependency, it's going to fail. If you forgot to remove some line of code that didn't have to be there, it's going to fail. But if it can be packaged, then it means that every line of code is a working line. So that's a very good step for quality assurance.

But if it can be packaged, then it means that every line of code is a working line. So that's a very good step for quality assurance.

And then run your unit test. You can validate the presence of a title, a legend, or a button, just to know that your dashboard is up and running. Also, have a development cloud environment. This is an environment that is meant to be broken. So it's for you to break. And if it is not broken, then it means that you can push it to production. This is especially important when you already went live. And you can also have a middle environment, like a staging or validation, where it's going to be an exact replica of your production environment. And you can run some end-to-end testings there, just to be sure that all the functionality is there.

So those are some of the best practices that you can follow. And of course, there are more. I will say that one of the biggest ones was the communication between the infrastructure engineers and the data team. So I really recommend you to get familiar with all these concepts. And I promise you that that communication will get easier.

Remember, this group of friends that are in somebody else's kitchen, it might feel weird at the beginning. But once they agree on the terms that they are going to be collaborating, how they are going to collaborate, the work gets easier. And they can start working in harmony and having fun. And most importantly, have great results. Thank you.