Sydeaka Watson | A Robust Framework for Automated Shiny App Testing | RStudio (2022)
For production-grade Shiny applications, regression testing ensures that the application maintains its core functionality as new features are added to the app. With the help of various R and Python tools that programmatically interact with the UI and examine UI outputs, regression test logic can be represented programmatically and can run as often as needed. This gives the development team an opportunity to catch and fix bugs before they are pushed to production. In this talk, I will introduce a framework for automated testing of Shiny applications both (1) during the development phase and (2) after the app is deployed. I will share a demo Shiny app along with relevant shinytest2 and Selenium code. Session: I like big apps: Shiny apps that scale
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, my name is Sadeaka Watson, and so that's me. I work as a data scientist at Eli Lilly, the pharmaceutical company. I'm on the development team for a Shiny application that is used at Lilly. So essentially my job is to wear this hat, the app developer hat.
This isn't just a normal Shiny app. This is a complex production-grade app. We get a new version of the app released every month with new features and bug fixes. And a lot of people in my organization visit this app every day, so it's really important that the app is functioning properly all the time.
A story about manual testing pain
So with that in mind, I'd like to begin with a story about something that happened to me back in May 2021. So I was in my home office working on a new feature for the app, but of course I was multitasking because I was also keeping an eye on the activity in our team chat window. So one of the users was saying she was having some issues with the app. It wasn't populating the drop-down menu with the values that she was expecting. So the values were in the app before, and then now all of a sudden she didn't see them anymore.
So one of the other developers identified the issue and determined that we would have to quickly release a new version of the app that resolves the issue with the drop-down menu. And that's when I panicked because I realized that meant two things were about to happen.
First, it meant that I would have to put on another hat, my code reviewer hat. So as a code reviewer, I'm responsible for reviewing GitHub pull requests, have to read the code to make sure it adheres to our coding standards, and then I have to run the app in a local environment to make sure that all the old features work with the new features. I have to check the radio buttons, the drop-down menus, the contents of the tables. I have to make sure that the database API and the model API connections are still working as they used to as we added these new features. And so if it passes those tests, then we deploy the app to the test server.
But then that means I have to put on yet another hat. This time I have to put on my app tester hat. So as a member of the app testing team, I was one of several people that had to manually test the deployed version of the app. So I would have to open the app in a web browser and then run a lot of the same tests as before, check the radio buttons, check the drop-down menus, check the API connections, database connections, all those other things that I did before in the code review. And so now the only difference is that this is on the deployed version of the app as opposed to the pre-deployment version.
Now, testing is very important. It does allow us to catch bugs so that we can fix them before the new version of the app is released to production. However, as one of the people who was responsible for actually running the test, I could say that this was a really extremely tedious, painful experience for me. It was very stressful.
I also felt undervalued because I'm thinking, OK, I have a PhD in statistics, and yet my job now is to click this radio button or check this drop-down menu. So I felt like this wasn't a good use of my time. Another issue is that I and the other app testers had other priorities. So we weren't full-time app testers. We could only run the test when we had time to squeeze testing in.
I also felt undervalued because I'm thinking, OK, I have a PhD in statistics, and yet my job now is to click this radio button or check this drop-down menu. So I felt like this wasn't a good use of my time.
And that meant that managing all of those schedules across the different testers meant that it could take a long time for us to get through those tests and actually go to the next stage of our software release. So that was increasing the amount of time it took us to deploy the app. Because it was so time-intensive, we had to sparingly apply these tests. So we couldn't afford to run them as new features were coming in. We had to use multiple scenarios. We could only do them really once if we could do that.
And then lastly, we found that different testers were running the tests with slightly different conditions. So there was some subjective assessment with whether or not the test passed or failed. So one tester could say it passed, and another person running the same test would say that it failed. So that was a problem.
Discovering automation tools
So because these releases were such a heavy burden for me, I started looking for a way to make this process more efficient for me and for our team. I was already familiar with Selenium. I knew it was a great tool for automating interactions with the web browser. And I figured this might help automatically check whether the UI elements were still working on the deployed version of the app.
I was somewhat familiar with the rvest package and that it was great for web scraping. So I figured this might help to programmatically extract the contents of the web page. And I could run some sanity checks against it to make sure everything's rendering properly so Selenium and rvest could work together on the deployed version of the app.
My colleague Eric Nant recommended that I use the shinytest2 package to test the code that other developers were submitting in their pull requests. So he figured that that might help to automate some of the code reviews. And then, of course, later I found out that there was another version of the shinytest package called shinytest2 that somebody will speak about later. And then finally, the shinytest documentation was recommending that we also use the testthat package.
And so now I had all these tools. There was a bit of a learning curve, but I took about three months of my time. And eventually, over the next three months, I was able to use those tools to create an automated testing pipeline.
Impact on the software release workflow
Let's see how the software release workflow changed after introducing these tools. So in the code review stage in blue, we see the manual pre-deployment testing during code reviews was replaced with the shinytest2 script that automates these same tasks. And now the code reviewer hat sits on the head of the bot that runs the test script.
And then in orange, in the post-deployment testing stage, I'm running the manual regression test. I'm replacing those with a suite of Selenium test scripts. And then I get to remove that heavy app tester hat from my head and place it on the bot who is happy to run those test scripts. At least he doesn't complain about it if he does not feel happy about it.
So those two changes had a significant impact on our software release workflow. Our code reviewers and app testers had less work to do. They weren't feeling stressed out with each new release. We didn't have to worry about testers balancing their testing duties with other high-priority work, so we ended up shortening our time to deployment from five days to five hours, which was pretty huge for our testing team.
so we ended up shortening our time to deployment from five days to five hours, which was pretty huge for our testing team.
We could apply our testing criteria as often as needed. So no matter how many releases we had during the month or feature pull requests that we merged during the month. And then because the test logic was contained in the test script, that meant we could consistently apply that test logic from one test run to the next so that removed that variation in test results that we were looking for.
The automated testing framework
So I've talked about this before on the previous slides, but here's where I get to formally introduce the automated testing pipeline that I'm proposing. It contains four key principles. First, I believe you should automate the tests. You should actually have some sort of programmatic representation of your test logic.
And this could obviously be, we could easily fill an hour or maybe a day talking about this, but lots of different things like how do you recover safely from errors? How do you appropriately log the test results? How do you have this repository of all your test artifacts, maybe your screenshots or things that were downloaded in the course of running the app? That's what I mean by automating the test, all of that.
Next, it's important to have one or more development servers that are completely separate from your production server. And that allows us to see how the updated app works without breaking the production version that end users are interacting with. And then the next two principles have to do with when we would run the test. The first is that I recommend running some pre-deployment tests so that you can integrate these features safely in a safe space on your development server without crashing the version of your app that the end users are using. And then finally, I would recommend running the post-deployment check also on the development server. And this is where we can more closely mimic what the end users are going to experience when they log into the app using the web browser.
So again, pre-deployment and post-deployment, that's really important. One question that comes up is whether automation makes sense for a particular use case or a particular web app. And to answer this question, I usually follow up with a couple of follow-up questions. The first would be, is the logic consistent? The test logic, is it consistent from one run to the next? So if you can say, step one, step two, step three, if you can actually have all of that logic and not have to worry about a change from one to the next, that's one consideration.
And then the second question would be whether the test logic can be completely scripted so that you can get the result without having to use any sort of subjective judgment or having some sort of human involved making that decision. There are other considerations, obviously. So there's, do you have the infrastructure to support these automation tools like shinytest2 or Selenium? Do you have time to do this? There's a little bit of a learning curve and a little bit of time investment in terms of writing these test scripts. But in general, I would say if you answered yes to both of those questions, then I would say your app is a prime candidate for automation.
Working example: the ice cream app
So I prepared a working example that you can play around with on your own. It's a simple app that allows you to create an ice cream recipe using various ingredients like peanuts and raisins and chocolate syrup. And for the adventurous, if you want to add some chicken nuggets to your ice cream, no judgment here, but we allow you to do that. So the GitHub repo linked at the bottom has all the code needed to run the app. It also has the shinytest2 and the Selenium code that will allow you to just kind of play around with this in your own environment. So I definitely encourage you to check it out.
I created a couple of screencasts that demonstrate how this would work. So first, I'll show you the pre-deployment testing with shinytest2. Now, if you're running the shinytest script in the RStudio IDE, I'll pause this. First, you'll see that there's a Run Test button visible at the top of the script. And if you click that, you'll see that there's some activity happening over here in the Build tab that shows the progress of the test run.
This test script automatically opens up this local version of the app in a web browser window. So I didn't do that. It did it on its own in Google Chrome. And then you'll notice that it's selecting various checkboxes and entering text in the text field and doing all kinds of things and showing how the app is changing. So all of those things happened automatically. And so when I go back to the RStudio IDE, I see that the test result folder now contains a few files that were saved in the test run. And if I click on one of those, I've got some screenshots that show what the app looked like at various points in the app. There's also this JSON file that shows the values of various elements in the UI as they were at the time that the snapshot was created.
If you stick around for the next talk in this session, you'll actually get a chance to see the package author of the shinytest2 developer give a presentation about shinytest2. So I didn't spend a whole lot of time thinking about that for this presentation. But I highly recommend that you check that out.
And finally, I'll show you what the post-deployment version of the testing looked like with Selenium. So as before, we click the Run Test button at the top. We will also see the test run progress in the Build tab in the upper right corner. Now, before, we would actually open up automatically this web browser in Google Chrome. But this is a headless browser, so we don't actually have access to that browser. Instead, it's just going to do it under the hood and silently execute those test instructions that I provided.
And so we'll just see the mouse that I can't see anymore for some reason. Where are you, mouse? I see it on your side, but not my side for some reason. Give me mouse. OK, there we go. I'll pause that here.
So what you saw, and unfortunately, I wasn't able to narrate that while it was going, while I was looking for my mouse, was that it was showing you how it was actually programmatically running the script. And then it captured some screenshots along the way as it progressed through the various test scripts. And it also, in the log area, it let you know what was going on in the app. So at this point, it was selecting the flavor. And another point, it was selecting the recipe name and so on.
So this was all very easy and straightforward thanks to the tools that some of the other developers have created. So I thank them for that.
Closing thoughts
So in closing, I'd like to re-emphasize that these tools really make automated testing more accessible to the general population of data scientists, especially people like me who don't have any formal training in software engineering or software testing. We're kind of expected to know a lot and do a lot about a lot of different areas. But this makes that a lot easier for us to do.
And then another thing that is out of scope for this talk, but I get asked this all the time when I give a presentation about automated testing, how does this fit into the CI-CD pipeline? It definitely does. And there's lots of tools that you can use to do that. In particular, for example, if you had GitHub Actions, you could programmatically run. So as a person submits their pull request, you could programmatically run your tests on the pre-deployment version of the app, merge the pull requests, deploy to the next test server, and then actually programmatically run that next set of scripts. So you can do a lot of these things very, very easy.
So I'd like to thank my colleagues at Eli Lilly and the Design Hub Data Insights, Information and Digital Solutions, and the Statistical Innovation Center teams for their help. And I'd also like to thank Wardell Piquette at Piquette Studios for his help with the graphic design in my presentation. And here's my contact information. Feel free to reach out if you'd like to discuss anything. And I thank you for your time. Thank you.