Fernanda Viegas & Martin Wattenberg | Data, visualization, and designing AI

Transcript#

This transcript was generated automatically and may contain errors.

So, I am really so thrilled to introduce our next keynote speakers, Martin Wattenberg and Fernanda Viegas, because Martin and Fernanda, I believe, are really two of the top visualization researchers in the world today. I have been a follower of their work for over 10 years, the chances are you have seen some of their work before. In fact, if you've ever looked up the definition of a word on Google and seen the little line chart, I believe that is Martin's most popular or most well-seen visualization.

One of my personal favorites is a wind map, really beautiful visualization of dynamic wind currents over the continental US. If you Google wind map, fantastic, fantastic visualization. But recently, Fernanda and Martin have been working on using visualization to display really complex ideas and unpack many of the tools of modern machine learning and artificial intelligence. So, please join me in welcoming Fernanda and Martin.

So, yeah, thank you so much. I want to talk a little bit about what we're doing now, because it combines both visualization, which has been our bread and butter for a while, with AI. Our co-leads, with Jeff Holbrook, a group at Google called PAIR, stands for People Plus AI Research.

Before saying anything more about it, I want to quickly put up pictures of the people on the team, because I want to emphasize that everything we're talking about today, this was all very much a team effort, and these people deserve credit.

The sort of mission of PAIR is to make machine learning and artificial intelligence productive, enjoyable, and fair. We do that in various ways. One of which I'm happy to say, actually, we've released a fair number of open source tools, which we will talk about today. We also create educational materials, academic publications, and do a lot of public work as well.

Now, you might ask, why are we talking about human-AI interaction at this conference? And there's sort of a snarky answer you could give, which is that some might say AI is just fancy statistics, and maybe there's some truth to that. But there is kind of a more serious answer, which is that you, as people who analyze data, you have a lot of power, and you know that. But you might have even more power than you realize, that as AI becomes more and more pervasive, it's turning out that data analysis is very much the key to getting AI right.

But you might have even more power than you realize, that as AI becomes more and more pervasive, it's turning out that data analysis is very much the key to getting AI right.

So to start with, we're going to start at the beginning. We have a few projects we'd like to share with you, and one of them is about just visualizing training data. I'm sure you all know this, that a big part of what goes into these massive machine learning systems is the data, and you need to be super careful and mindful about the data.

And yet, when we talk with engineers, software engineers, about some of the problems that come up in these systems, their first instinct tends to be to debug their program. And so our favorite motto is debug your data first, not your program. And this is something of a learning curve. And I think you all are way ahead of the engineers, but one of the realities is that we need tools for dealing with the huge amounts of data that these people are using.

And so we're still working on that. One example of a tool we created, which is a visualization for just training data, I'm going to demo that with one of the Hello World datasets of machine learning, which is called CIFAR-10. And all it is, it's a bunch of images, square little images of 10 different classes of things. Each image has just one main object or entity inside of it. And the entire dataset has been human labeled in these 10 different classes.

So you have deer, you have car, you have automobile, and so forth. And another thing that's important about this dataset is that because it is one of the Hello World datasets of machine learning, it is used everywhere. It is used as a benchmark. It's been used by thousands of people all over the world, and it's used in publications as well.

So this is our little visualization, and it's called Facets. And all I'm doing here is visualizing the images I have just by the different classes. So as far as data visualization goes, it's extremely straightforward, right? I can zoom in, I can look at images very closely if I want, see how pixelated they are, and so forth. Nothing super special.

You can see the labels of the classes, and you can also see that if I accidentally click on any one of the images, it brings up a little card at the very top there with the metadata about each one of these images. So far, nothing special. But even this is hard for regular engineers using machine learning systems to get to see. They don't usually get to see their training data.

And the reason why I'm sure you all realize why you want to do something like this is because it could be that half of your data set is just empty, it's just blank images. Or even worse than that, it could be that just one of your classes is blank and nothing else. And so you want to be able to just very quickly play with your data and look at it.

Once we have this, we can play little games, right? So I can look at the same visualization, but I can now distribute it by hue. And I can see that there are different bulges in different classes. So I can see that airplane at the very top and ship towards the bottom are the classes with the most blue images, right? That kind of makes sense. I'm taking images, I'm taking photos against the sky or against the water. And then all of my animal classes, kind of in the middle, bird, cat, deer, they tend to hang out more on the earthy tones.

Now I can do other things. I can say, now show me a confusion matrix. And all it is, a confusion matrix is basically me trying to understand how in sync humans and my machine learning system really are. So everything on the column there are the labels that have been manually given to these images by humans. And everything at the top are the labels that my system is giving as it tries to classify these images.

So the good news for me is that that diagonal is by far the most populated set of cells I have there, because that's where humans and machine agree. So that's great. But then other things I can do with this, I can say, okay, so now let's filter out all the correct guesses. I'm going to filter out that diagonal. And then immediately I have all the mistakes that my system is making. And I can see an interesting pattern here. I can see that my most populated cell or couple of cells over here and over here is the intersection between cats and dogs. My system is kind of confused between cats and dogs, right?

So it could be that at this point I go back and give my system more examples of cats and dogs so we can retrain and hopefully get better at these classes. This could be one of the things I do.

Another thing I can do with this visualization is look at the softmax labels. So towards the end of my network, I want to understand not only how these images are being classified by my system, but I also want to understand how sure my system is, how certain my system is of any of those classifications it makes.

So the way to read this is the more an image is to the right, the more certain my system is that that is indeed a dog or a cat or an airplane. And everything, the more to the right, the more correct it is. On the contrary, the more to the left you have these over here, the more certain my system is that that is not a cat or not an airplane. And those are incorrect classifications.

So one of the things I might want to do here is I might want to go, we became interested. We're like, oh, wow, but check it out. Cats is still very populated here. There's a bunch of cats. My system is very, very sure are not cats. So we zoomed in and started looking like, wow, what do these cats look like that my system just very certain aren't cats, right? And again, think back that this is a benchmark dataset used by lots of people all over the world.

We started looking at this and we're like, there is totally a thing there that's not a cat. Can you, can anyone spot it's towards, it's on the second row to the bottom. Over here, my system is very sure that's a frog and so am I. And yet it has been labeled by humans as a cat.

So what is our little tale telling us here? The same tale as usual. You have to go back to your data and look at it. And there are going to be mistakes. And if you can't look at it in an easy way, you're going to continue to see these mistakes over and over again.

So this was a tool that we started using internally and then decided this is actually a simple, straightforward tool that's actually super useful. And we decided to open source. So facets is a open source tool available for anyone. And yes, I know that you can do this in R. And I'm sure that a lot of you are doing it already, right? Slice and dicing, simple, simple stuff. But there's a lot of people who don't use R too. And so having something on the web is also useful.

And this is actually a sobering moment. When people started to realize this about these embeddings, I think they just realized how serious things had become, that these things had been trained on a huge corpora of existing words and language. And those had contained a bunch of biases. And this new word embedding was very efficiently encoding some of those.

So in fact, we can show a couple of other examples. This is the neighborhood of engineer, shows many of the similar things. If you do a neighborhood of math. I want to read it out loud. Yeah, you can see geometry, computational on the man side. Again, psychology, art, library on the woman's side. And it's not even just sort of the big famous biases, like gender or something. But even something like old and new, and not with people, you can sort of see interesting things that are meaningful. Like if I put in book and look at its neighbors, I see for old, I see things like poem and manuscript. On the new side, I see like company, for instance, magic. And you can start to see it's picking up on important associations, but in a way that is potentially problematic.

TCAV: interpreting neural networks

So we have this thing that's powerful. It requires a lot of analysis. But let's think about how we could use this idea of meaningful directions in other ways, now that we have it. So one way, it turns out, is to interpret what's going on inside of neural networks so that they're not, in fact, black boxes. And I'd like to very quickly fly through a method called TCAV, published in ICML 2018.

And the idea behind this method is to take this insight that, OK, if things like word embeddings have directions that sort of correspond to things that are meaningful to humans, like capital cities to capitals, maybe there are other types of embeddings that have that same property. And we could use that to our advantage.

So I'll illustrate this by imagining a question you might ask of an image classifier. Let's say it looks at a picture of a zebra. And you wonder, why did it think that thing was a zebra? So there's sort of a bad answer to this question and a good answer. The bad answer is if it told you something like, oh, it was this particular pixel, or these sets of pixels. Those are not terms that a human can understand. A much better answer would be if it could say something like, oh, it was striped, and that helped.

Of course, it's very hard to kind of figure out how to extract that answer. However, there is a way to do it. So let's follow what happens if you have a neural network that is recognizing an image, like zebra. So you put in an image. Let's say it's an image of a zebra. And then you put it into your neural network. And what is shown here is a little cartoon of one famous network called Inception.

And the idea is that in this kind of network, the image gets sort of transformed layer by layer. And you can think of each layer as performing a transformation of Euclidean space. And so if you put something like zebra through it, show it an image, in each of these layers, you can think of the state of the layers as a set of activations of the neurons. And those activations form a vector in high-dimensional space. So you can see how this starts to relate to what we were just looking at.

And then what's interesting is if you put a whole bunch of images through, you get a bunch of points in this high-dimensional space. And you can start playing this game with meaningful directions. So let's show how this works. So the idea is that let's say there's a concept you're interested in, for example, stripes or the presence of stripes. You don't even have to try to define to the computer in language what stripes is. You could just show it a bunch of examples of striped images.

And then you put those striped images through your neural network. And then as you see. And one important thing, these are not images of zebras. This is just any stripes, anywhere. We've done this literally with striped ties, and striped shirts, and abstract stripes. That's a good point. And then what you'll get is a kind of cluster of images in this space of activations in an intermediate layer that represents stripes. And then you can also put a bunch of random other images through and train a classifier to distinguish between the stripes and the non-stripes. And that gives you what's called a concept activation vector, a direction in space that represents that concept.

And then this is what's cool, is that you can start calculating with that. That vector is something you can, for example, take directional derivatives to understand the sensitivity to this concept. So computers like it. But humans like it too, because it's a human-friendly concept of stripes that they're getting an answer in terms of. So this turns out to actually be a very useful technique in general. There's a lot more we can say. But what's really interesting is that you can go from just interpreting or understanding directly to control.

Applying TCAV to cancer pathology

Yeah, so imagine that. You start with this super high dimensional space. And you find meaning in that space, meaning that is also meaningful to humans. So what I want to show you next is an example of a project where we used that for health care.

So here's the scenario. We were working with pathologists trying to diagnose cancer. And the way they work is they will look at a slide of a patient, very much like the large image you see here. And then they will very, very closely analyze this image and try to understand, one, do I believe is there cancer here? And if so, what grade of cancer? This is what's called Gleason 1, Gleason 2, Gleason 4, OK?

And so even though there is an extensive set of work that has been done with optimizing algorithmic performance to try to look at an image like this and say, yes, there is cancer, no, there is no cancer, much less attention had been paid to actually what kind of information and what kind of interactions would doctors like to get out from a dialogue with these systems? Instead of just being given a number and a probability, do they actually want to interact and direct these systems? It turns out that, yes, they would like to do this.

And part of the reason why they want to do stuff like this is because many times for things like sometimes there are artifacts in these images and they want to say, dismiss that. I don't want you to pay attention to that thing. That's not important. Or there are anatomical features that they care deeply about, things like the density of glands or whatever other concept. And think about it. Those are high level concepts, right? How do you tell the machine what I really care about or how dense the glands are here?

Basically what we did is we created a set of tools that allowed pathologists to very quickly navigate these highly massive data sets. Because again, what they're trying to do, they're looking at that slide and they are trying to get the system to bring to them the nearest neighbors, right? What is the patient I've seen before that has been diagnosed? What is the closest case to this case I'm seeing now?

But that's not a one-shot thing. That is the thing that's interesting. It's not just like automate that. So one of the things is just allowing people to cut little, to say, focus here. That's what I want to do. Things like these have been done before, right? Or once you bring me a bunch of cases that are like the case I just highlighted for you, now let me go deeper into one of those cases and bring me more closest neighbors to those cases. So in other words, let me navigate by example.

The thing that's new that we can do is now with things like CAVs that you just heard about, we can actually create much more sophisticated ways of navigating these systems in medical concepts. So working with the pathologists, they told us that a medical concept they deeply care about is something called stroma. I'm not a doctor. I had no idea what this was, but for them, it was super interesting. So we created a slider where they could say, show me cases like this, but with more stroma or with less stroma or fused glands. That's another medical concept they care deeply about.

And so because again, these ML systems inside these highly massive spaces, there are meaningful directions. We were able to map these medical concepts to these meaningful directions in the system. And that gave doctors a much more powerful way of interacting with the systems. And one of the really important things here is to actually for the doctor to calibrate their trust. Should I trust this? Should I not trust this? Or even to learn how the system works. How does it behave? How does it think about edge cases?

And so we actually had a user study. In this case was with 12 pathologists looking both at a traditional machine learning system that just gives you a prediction with a confidence level, or this system called Smiley, where you have all these different tools that you can interact with. And the results were very positive. Basically, all 12 pathologists preferred Smiley. The level of trust was significantly higher. And even though you would think, if you could only automate my work for me, I'd be done. They actually found a lot of use in going back and forth. But not only that, steering the system in ways that they cared about.

Data scientists as user experience designers

OK, so yeah, I think what you see from this is sort of the ability of statistical analysis

Fernanda Viegas & Martin Wattenberg | Data, visualization, and designing AI | RStudio (2020)

Transcript#

Visualizing training data with Facets

Fairness in machine learning

The What-If Tool

High-dimensional space and word embeddings

TCAV: interpreting neural networks

Applying TCAV to cancer pathology

Data scientists as user experience designers

Featured software#

rstudio