Ryan E Wade | The R integration in Power BI: Why R developers should be Interested | RStudio (2022)
Power BI is a multi-faceted business intelligence tool with built-in data ingestion, transformation, and visualization features. The learning curve for Power BI can be steep for many, but fortunately, R users can shorten the learning curve by leveraging the Tidyverse. In this talk, attendees will get introduced to how the Tidyverse can shorten the Power BI learning curve and enable them to perform tasks in Power BI that are not possible using its native functionality. Session: Business intelligence
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hello, my name is Ryan Wade. I am a senior cloud solution architect for Microsoft specializing in data and AI. I'm here to present to you a topic covering the R integration in Power BI and why you, as R developers, should be interested.
I am very passionate about both R and Power BI, so much so that I wrote a book about it. I understand that many R developers do not share the same passion about Power BI that I do. Many of you all may be saying, why should I even be interested in the R integration in Power BI? The reason why is because the Power BI community needs you.
What this talk is not about
So before I get into the meat of this talk, I want you all to know what this talk is not about. This talk is not about me trying to convince you to replace your Shiny apps with Power BI. You are able to do things in Shiny from our visual standpoint that you can't do in Power BI, and I'm not trying to convince you to make that switch.
I'm not trying to convince you to refactor the notebooks that you develop in Jupyter Notebooks, R Markdown, or Quarto, and replace that with Power BI. Those two different tools, or those tools, have a different purpose than what Power BI has. So I would not try to convince you to make that change. And also, I'm not going to tell you to start doing your R development in Power BI and ditch RStudio. I am not trying to convince you to do that. Microsoft is not trying to convince you to do that either. Microsoft gives you the option to do your development in RStudio when you're writing code that is purposed for Power BI. So that is what the talk is not about.
Addressing the disconnect
What the talk is about is to address the disconnect between some R developers and Power BI developers. Both sides have their own perception of what the other tool is, and often that perception is not correct. So we're going to address that. I'm also going to try to introduce Power BI to some R developers that is not familiar with the tool. You've probably heard of it, but you've probably never really seen it in action, so I'm going to try to introduce it to you and give you a high-level understanding of what Power BI is.
And then I'll give you some examples of integration so that you can say, wow, Ryan was able to do these things in Power BI. Based off of how I see he's using the functionality, maybe I can do these other things that Ryan hasn't thought of.
And then lastly, I will show you how to get started. So first, let's address some of the disconnect. There are a lot of unknown unknowns that R developers have about Power BI, and that Power BI developers have about R. From an R standpoint, when R developers look at Power BI, they just see another GUI-based data visualization tool. Power BI is much more than that. And in this talk, I will show you the reason behind that.
And then on the Power BI side, when Power BI developers look at R, they see a domain-specific programming language that they think is just for data scientists and statisticians. They don't realize that a lot of the things that Power BI attempts to do have been done in R community for some time now, since the 90s. I think R was developed in the early 90s, like 93. So the R community has been doing things from a data visualization standpoint that Power BI, that the Power BI community had just recently started doing.
Introducing Power BI
Power BI is a data visualization tool. In order to do data visualization, you need data. And the data model that Power BI prefer is a data model known as the star schema data model. The star schema data model is made up of two types of tables, a dimension table and a fat table. The fat table you see here in the center, and the fat table stores measures. You can think of a measure as something that measures and a business activity.
In this particular simple example, we see that the measures are price and quantity. And then the fat table is connected to the dimension table. And in the dimension tables, you have dimensions grouped by category. So you see in this particular example, we have a store dimension, a employee dimension, and so on and so forth. In those different categories, you have dimensions that you can group your measures and your fat table by.
Another thing I want to point out about Power BI from a data standpoint is how the data is stored in Power BI. Power BI uses an engine called a VertiPaq engine. And the takeaway I want you to get from that is that the VertiPaq engine, one, stores the data in memory using what is called column store indexing. And the takeaway you should get is that storing data in that fashion does two things. It makes performance, you know, some complex calculation relatively faster than what it will perform if you are using another tool. And in addition to that, the column store indexing increases the memory footprint of your data by a lot. In some cases, up to a tenth of the size of the original data set.
So the second thing I want to point out is the built-in data wrangler tool that's in Power BI. And this built-in data wrangler tool, if you are using Microsoft for sale, this tool is built into that as well. This is like a gem that many people don't know about. This tool is called Power Query. And this tool, using a GUI, allows you to do many types of data wrangling tasks with low code. Super easy. A person who has no coding experience can do a lot of complex data wrangling that we in our community will probably have to write a dplyr script to do with no code. And in the instances where you need to use code, it has a programming language underneath the hood called M that you can write some code to do some things that you can't do throughout the GUI. Super powerful.
Calculated measures and DAX
The feature that I like the best about Power BI are the calculated measures. What I like about calculated measures is the way they operate inside of Power BI. They are what I call context-based, meaning that they calculate based off of the context that you put them in.
So in this particular situation, we have a simple matrix where we're displaying three different measures, and those measures are the cells amount calculated measures, and those are the cells amount calculated measure, the comparison cells amount calculated measure, and adjusted comparison cells amount calculated measure. One thing I want to point out is that we look at the highlighted cell. In that cell, the comparison cells amount calculated measure is calculated in the context of the AdventureWork brand and also the context of the comparison date filter that you see in the slide filter on the upper right portion of the screen.
Notice I didn't mention the filter that was being applied by the other date slicer, and I only mentioned the slicer on the right. That is because the programming language that Power BI uses, a programming language called DATS, Data Analysis Expression Programming Language, it allows you to incorporate some pretty sophisticated business logic inside of it, and in this particular situation, you can see that we were able to remove the filter from one of the slices, and we only kept the filter that was applied in the slices in the upper right. Super powerful feature. You will be surprised at the type of business logic that you can encapsulate inside of a DATS expression using Power BI.
You will be surprised at the type of business logic that you can encapsulate inside of a DATS expression using Power BI.
Why R developers should care
Then, of course, it's a data visualization tool. Just like any other data visualization tool, whether you're talking about Power BI, Qlik, Tableau, I know we have a speaker talking about that particular tool. You can build dashboards and get as RC as you want. I'm not good at that aspect. I do everything pretty bland, but if you're a very colorful person, you can do that, or you can pretty much do any type of visual you want in Power BI just like you can with any other visualization tool.
Now, up to this point, you may be saying, Ryan, you just described a pretty awesome tool. Why do we even need the R integration inside of Power BI? How do we even benefit from this? Well, this is my opinion, this little graph right here. This is my opinionated view about how I think Power BI is. This is my assessment of Power BI using these four different categories. In these four different categories, you see in most of the categories, I think Power BI is a very awesome tool other than the data science modeling piece. It does a good job of data visualization, data wrangling for most situations. Then from a context-based calculation that I showed you earlier, that feature right there in Power BI is awesome.
Now, the data science modeling side of things is very limited. You normally would need the premium capacity version of Power BI to be able to leverage Azure Machine Learning services. You can do some data modeling there and then connect to Power BI from there. Or you can use some pre-built data models in the form of Azure Cognitive Services. So, from that standpoint, still, it's kind of clunky. I don't really recommend it.
But when you add R to the situation, you'll see that from a data viz, there'll be 80 to 90% of the time, you'll have a visual that's already baked into Power BI or that you can get from the Microsoft Store to import into Power BI. But there'll be those edge cases where if you are used to coding in ggplot, you can create that visual easily in ggplot, but you can't do it in Microsoft. That is what that orange is representing. Then on the data wrangling side, there are situations where you may need a regular expression to do some type of stream manipulation. Or you may need to do something that is pretty clunky to do in Power BI, but you can easily do it in dplyr. And those situations is represented by the orange there.
And then on the data modeling side, if you're doing something in data science where you want to bring in scored data into your Power BI data model, if you're using Power BI in a self-service fashion, you can take those R scripts that you have that will ingest in data scored, and you can take that scored data and easily adjust into Power BI. And that's why I added that little orange for the data science modeling piece.
Examples of R integration
So now let's talk about some examples. The first example is a very simple data science machine learning example using a data set that if you were learning any type of data science, you're probably familiar with this data set, the Boston housing data set. And it's a very simple illustration of how you can take a model that was saved in the RDS file on disk, consume that model, and then consume the data that you need for the scoring. And just like you do in any other situation with R, you can use that model and the new data that you want to score, score it, and then present that data to Power BI as a data set.
If you had a more complex workflow, you all that went to the tidy workshop here. Or if you're familiar with tidy models, you can use tidy models if you have a more sophisticated workflow that you need to do prior to scoring the data. Once you score that data, the caveat is that that data set needs to be exposed as a data frame. If it's exposed as a data frame, then you can bring it into Power BI. If you have it in any other data structure list, Power BI would not be able to read it.
Another aspect that I like when using R, the R integration in Power BI, is if I need to read data from a data API. In this particular example, I'm reading data from the U.S. Census. And I can easily consume data from a data API using R. From a Power Query standpoint, they have some integration, but it's very concrete, and it's not as performant as using R. So, in this situation, I'm looking at, I'm integrating the Census API, bringing in data, and exposing that data to Power BI. And you'll see that it will be exposed to Power BI as a data set. That can be extremely valuable when you're using outside data to bring more value to the data that you have in-house.
Now, the thing I like the best is the data visualization piece in terms of the R integration. In this example, this visual is near to me. This is a visual where I was, it was the first visual I built when this, when the R integration became available in Power BI. And in this situation, you'll see that we are able to do things with R that we're spoiled, but the Power BI community don't even know of some of these capabilities. One of those capabilities is the ability to have annotations inside of your graph.
Now, in this particular graph, if you were to add up all the different classes, you'll see that those, the numbers don't add to the total number of countries, the 235 countries. And I'm able to add information in my title about why that's the case. For each individual violin plot that's overlaid with a boss plot, I'm able to give information about that, the number of countries that was in that particular class, and the trade balance for that particular class. Those type of annotations in the Power BI community, they're not aware of. And that's, those are one of the things that make it super powerful.
We are able to do things with R that we're spoiled, but the Power BI community don't even know of some of these capabilities.
Another thing is, if you want to choose another year, you'll see that I can easily pick a button, and then that filter will be propagated to my R visual. And if I wanted to look at this and I say, okay, looking over to the left-hand side, and I see that the Caribbeans, for instance, we have a positive trade experience with the Caribbeans. If you look at the left-hand side, I can drill down further into that information and get detailed information using the Power BI functionality.
So, this R visual, and we're on the R side, we're used to be able to be this expressive with our visuals, but not on the Power BI side. And you as R developers should reach out to some of your people in the Power BI community and, you know, teach them how the R integration can be super beneficial to them. I gave you some examples here, but there are many more examples that once you start digging into it, you'll be able to find out more examples and your unique use cases you can use also to be creative with using that R integration inside of Power BI.
That's the end of this presentation. If you want, you can go to this GitHub site, and I have a little GitHub repo that gives you some high-level information about leveraging the R integration inside of Power BI. I appreciate you all coming to the talk and listening to me tell you about this R integration, and I hope you go out and spread the love of the R community to other Power BI developers and show them why we love the great programming language of R. Thank you.