Transcript#

This transcript was generated automatically and may contain errors.

So, I am really so thrilled to introduce our next keynote speakers, Martin Wattenberg and Fernanda Viegas, because Martin and Fernanda, I believe, are really two of the top visualization researchers in the world today. I have been a follower of their work for over 10 years, the chances are you have seen some of their work before. In fact, if you've ever looked up the definition of a word on Google and seen the little line chart, I believe that is Martin's most popular or most well-seen visualization.

One of my personal favorites is a wind map, really beautiful visualization of dynamic wind currents over the continental US. If you Google wind map, fantastic, fantastic visualization. But recently, Fernanda and Martin have been working on using visualization to display really complex ideas and unpack many of the tools of modern machine learning and artificial intelligence. So, please join me in welcoming Fernanda and Martin.

So, yeah, thank you so much. I want to talk a little bit about what we're doing now, because it combines both visualization, which has been our bread and butter for a while, with AI. Our co-leads, with Jeff Holbrook, a group at Google called PAIR, stands for People Plus AI Research.

Before saying anything more about it, I want to quickly put up pictures of the people on the team, because I want to emphasize that everything we're talking about today, this was all very much a team effort, and these people deserve credit.

The sort of mission of PAIR is to make machine learning and artificial intelligence productive, enjoyable, and fair. We do that in various ways. One of which I'm happy to say, actually, we've released a fair number of open source tools, which we will talk about today. We also create educational materials, academic publications, and do a lot of public work as well.

Now, you might ask, why are we talking about human-AI interaction at this conference? And there's sort of a snarky answer you could give, which is that some might say AI is just fancy statistics, and maybe there's some truth to that. But there is kind of a more serious answer, which is that you, as people who analyze data, you have a lot of power, and you know that. But you might have even more power than you realize, that as AI becomes more and more pervasive, it's turning out that data analysis is very much the key to getting AI right.

But you might have even more power than you realize, that as AI becomes more and more pervasive, it's turning out that data analysis is very much the key to getting AI right.

Visualizing training data with Facets

So to start with, we're going to start at the beginning. We have a few projects we'd like to share with you, and one of them is about just visualizing training data. I'm sure you all know this, that a big part of what goes into these massive machine learning systems is the data, and you need to be super careful and mindful about the data.

And yet, when we talk with engineers, software engineers, about some of the problems that come up in these systems, their first instinct tends to be to debug their program. And so our favorite motto is debug your data first, not your program. And this is something of a learning curve. And I think you all are way ahead of the engineers, but one of the realities is that we need tools for dealing with the huge amounts of data that these people are using.

And so we're still working on that. One example of a tool we created, which is a visualization for just training data, I'm going to demo that with one of the Hello World datasets of machine learning, which is called CIFAR-10. And all it is, it's a bunch of images, square little images of 10 different classes of things. Each image has just one main object or entity inside of it. And the entire dataset has been human labeled in these 10 different classes.

So you have deer, you have car, you have automobile, and so forth. And another thing that's important about this dataset is that because it is one of the Hello World datasets of machine learning, it is used everywhere. It is used as a benchmark. It's been used by thousands of people all over the world, and it's used in publications as well.

So this is our little visualization, and it's called Facets. And all I'm doing here is visualizing the images I have just by the different classes. So as far as data visualization goes, it's extremely straightforward, right? I can zoom in, I can look at images very closely if I want, see how pixelated they are, and so forth. Nothing super special.

You can see the labels of the classes, and you can also see that if I accidentally click on any one of the images, it brings up a little card at the very top there with the metadata about each one of these images. So far, nothing special. But even this is hard for regular engineers using machine learning systems to get to see. They don't usually get to see their training data.

And the reason why I'm sure you all realize why you want to do something like this is because it could be that half of your data set is just empty, it's just blank images. Or even worse than that, it could be that just one of your classes is blank and nothing else. And so you want to be able to just very quickly play with your data and look at it.

Once we have this, we can play little games, right? So I can look at the same visualization, but I can now distribute it by hue. And I can see that there are different bulges in different classes. So I can see that airplane at the very top and ship towards the bottom are the classes with the most blue images, right? That kind of makes sense. I'm taking images, I'm taking photos against the sky or against the water. And then all of my animal classes, kind of in the middle, bird, cat, deer, they tend to hang out more on the earthy tones.

Now I can do other things. I can say, now show me a confusion matrix. And all it is, a confusion matrix is basically me trying to understand how in sync humans and my machine learning system really are. So everything on the column there are the labels that have been manually given to these images by humans. And everything at the top are the labels that my system is giving as it tries to classify these images.

So the good news for me is that that diagonal is by far the most populated set of cells I have there, because that's where humans and machine agree. So that's great. But then other things I can do with this, I can say, okay, so now let's filter out all the correct guesses. I'm going to filter out that diagonal. And then immediately I have all the mistakes that my system is making. And I can see an interesting pattern here. I can see that my most populated cell or couple of cells over here and over here is the intersection between cats and dogs. My system is kind of confused between cats and dogs, right?

So it could be that at this point I go back and give my system more examples of cats and dogs so we can retrain and hopefully get better at these classes. This could be one of the things I do.

Another thing I can do with this visualization is look at the softmax labels. So towards the end of my network, I want to understand not only how these images are being classified by my system, but I also want to understand how sure my system is, how certain my system is of any of those classifications it makes.

So the way to read this is the more an image is to the right, the more certain my system is that that is indeed a dog or a cat or an airplane. And everything, the more to the right, the more correct it is. On the contrary, the more to the left you have these over here, the more certain my system is that that is not a cat or not an airplane. And those are incorrect classifications.

So one of the things I might want to do here is I might want to go, we became interested. We're like, oh, wow, but check it out. Cats is still very populated here. There's a bunch of cats. My system is very, very sure are not cats. So we zoomed in and started looking like, wow, what do these cats look like that my system just very certain aren't cats, right? And again, think back that this is a benchmark dataset used by lots of people all over the world.

We started looking at this and we're like, there is totally a thing there that's not a cat. Can you, can anyone spot it's towards, it's on the second row to the bottom. Over here, my system is very sure that's a frog and so am I. And yet it has been labeled by humans as a cat.

So what is our little tale telling us here? The same tale as usual. You have to go back to your data and look at it. And there are going to be mistakes. And if you can't look at it in an easy way, you're going to continue to see these mistakes over and over again.

So this was a tool that we started using internally and then decided this is actually a simple, straightforward tool that's actually super useful. And we decided to open source. So facets is a open source tool available for anyone. And yes, I know that you can do this in R. And I'm sure that a lot of you are doing it already, right? Slice and dicing, simple, simple stuff. But there's a lot of people who don't use R too. And so having something on the web is also useful.

Fairness in machine learning

The next thing we want to talk about is this notion that as soon as you start feeding these massive data sets to these machine learning systems, you are going to be talking about bias and skews in your data set. And as soon as you start deploying these systems in the world, you are going to be faced with fairness questions. And so one of the things, and these are very complicated, these are very complex issues, right? It's not like you have a technical solution for these things. But there is also a technical piece to trying to solve some of these things.

So Martin and I were interested in whether we could use very simple data visualization to start illustrating some of the trade-offs that you have to make when you start thinking about fairness in machine learning. And so we're going to play, as I introduce this visualization, it's very simple. We're going to play a little game here, which is we're going to imagine that we are a bank and we're deciding who we're going to give loans to and who we're going to not give loans to.

So each one of these dots is a person who comes to our bank asking for a loan. They have a credit score. All of these things are imaginary. My credit score goes from 0 to 100. And their location reflects their credit score. Each one of the light colored dots would default on our loan. All of the dark colored dots would pay us back. So in other words, we want to give loans to the dark colored dots and we do not want to give loans to the light colored dots.

Many people come to our bank. We set a threshold. Below that threshold, we don't give loans. Above that threshold, we give loans. So far, so good. Except that in reality, life is not that simple, right? So no matter where I put my threshold, chances are I'm going to be denying loans to people who would pay me back. And I would be giving loans to people who default.

And so this is still a simplification of the real world, but it's a distribution that starts to look a little bit more realistic. And you can see that no matter where you put your threshold, you're going to make incorrect guesses, right? Then the next question becomes, well, you may have different populations of people coming to your bank and asking for loans. And you may know that these distributions are different for these different people. And so how are you going to be fair in deciding, now that you have a blue population and an orange population with different distributions, how are you going to decide how to be fair?

So let me just show you the little demo slash visualization we created. This accompanies, by the way, a paper on machine learning fairness. So basically, we just created this little simulation here, which is kind of what you had been seeing before. And it's interactive, and I can play with it. And you can see what happens. As I am changing the threshold here at the top, at the bottom, you can see the percentage of incorrect and correct guesses I make, OK?

You can also see things at the bottom here, these pie chart things. They stand for the positive rate of loans I give. So in other words, of all the people who come to my bank, what is the percentage of people I give loans to? And then there's another pie chart that says, sorry, the positive rate is on the right. True positive rate is on the left. True positive rate means of all the people who come to my bank and would pay me back, what is the percentage of people I give loans to?

So I can play with this stuff. I can play with them independently. And the other thing I want to make sure, because I'm a bank, is I want to make a profit at the end of the day. So how do I do these things? And still, how can I be fair? So we put some presets here on the left. One is if you don't want to be fair at all, click on maximum profit. There it is. That's what your thresholds should be given the distributions.

If you do want to be fair, all right. Maybe what you want to do is you want to be group unaware. So I clicked on group unaware. And you can see that what this means is I have the same thresholds for both populations. Right. So I don't care if you're blue. I don't care if you're orange. You're going to be treated the same because that's the most fair thing to do. That's the fair thing to do.

Except that when we start looking at the positive rates here, I can see that of all the people who pay me back, the blue people, I give loans to 81% of them versus only 60% of the orange people. Is that fair? Well, maybe the next thing you could try is something called demographic parity. And demographic parity optimizes for the same positive rate. So of all the blue and all the orange people who come to my bank, I try to give the same percentage of loans.

Again, what this means is my thresholds are different, quite different for these different groups. Is that fair? Maybe. Maybe not. Then that last thing here I have is equal opportunity, which optimizes for a true positive rate. Anyway, long story short, there are many different ways to think about being fair. And how do you translate those values into math for deciding in a scenario like this?

The point of doing this was that Martin and I were interested if with a very simple abstract scenario like this, with a simple visualization that people could just play with, if they could start to develop a certain intuition for the trade-offs that happen when you're actually trying to decide this on a system. And it was interesting to see that this simple visualization went kind of viral on the web. And we started seeing people have these really interesting discussions of, oh, wow, I always thought that demographic parity was the way to go. But now I see that whatever, equal opportunity is maybe a more interesting way to think about fair.

And just to be clear, there are many, many more metrics of fairness that have come up since then also in the literature. And I think that's not going to stop. Another part of the conversation that we saw online that is, I think, very important is people realizing, oh, and you can't have all of these at once. There is no perfect solution, right? It's always a trade-off, and you're going to have to choose.

And so we started getting emails. And so for instance, one of the emails we got was from a criminal justice department in the US saying, we saw this visualization. Do you think we could create something like this for our own department so that we start to understand what the trade-offs are that we're making with the systems we're using? Obviously, we're not a bank. But this would be a very powerful way to have people in the department engage with these questions, right?

The What-If Tool

And so based on this and on the reaction we got, we decided to go from that toy simulation to actually trying to put together an actual tool that allowed you to do some of this playing around with your model and with testing your model for different scenarios. And so this is a tool we created called the What-If Tool. And this is a code-free probing tool for machine learning systems that allows you to test for some of these scenarios, to say things like, oh, what if I want demographic parity in the different slices of my data? What thresholds should I use in my model?

It allows you to do things like click on a data point and say, for instance, let's go back to the loan scenario and say, oh, this person did not get a loan, OK? This person did not get a loan. Who is the nearest person to that one that did get a loan? And what is the delta there? So a little notion of counterfactual, if you will, without the causality. That's a separate conversation. But just this notion of starting to try to understand what's happening with these systems, how they are behaving, asking a bunch of alternative scenarios, trying a bunch of edge cases, and also playing with notions of fairness, of how different slices of your data set are doing vis-a-vis one another.

This tool is open source, so it is available for anyone. It has also been integrated into Google's Cloud AI so that anyone using that platform can now play with these alternative scenarios and test things for themselves before they decide to launch something or to try to debug situations. And I think one of the really important things here, too, is the fact that we're not expecting that you need to be a PhD in machine learning to ask these probing questions. Many more people should be engaging and experimenting with the way these systems behave.

So I think the story so far is that you're seeing how important data analysis is in all of this. And then I think also, when you look at machine learning fairness, there's this issue, number one, that trying to make the world better, to try to do the right thing, involves statistics. However, also, if you are deploying models, this is something you now have to think about or should have been thinking about all along, indeed. And I think that's an important change of mindset for many people that we've seen. And again, statistics is a key part of this.

High-dimensional space and word embeddings

There's something else going on, though, which is that using data analysis, you can actually end up finding new ways to control AI systems. And to explain that, we're going to take a journey through high-dimensional space to understand the superpower. So let's talk about data in high dimensions. And we're going to introduce a couple of things in a row here. We'll start with a warm-up. And for our warm-up, we're going to visualize another famous Hello World data set for machine learning, a data set of little 28 by 28 icons that represent handwritten digits that is often used to test very simple machine learning models.

And what we're going to talk about is how you can think about each of these images as a vector in high-dimensional space. And the trick, which is familiar, I'm sure, to some, if not many of you, is that we look at the pixels and we give them all values. In this case, we'll say black is 0, white is 1, and intermediate values get some floating point number. And then we just list the values of all the pixels in a row. And that gives us a vector.

All right, yes, so now you see rotating. You are looking at 784-dimensional space here. What you are actually seeing, of course, is a two-dimensional projection of a three-dimensional view of this high-dimensional space that is using principal component analysis to project these linearly. And that's definitely an interesting thing. But it also is kind of a big blob. As much as one can ooh and aah over this, it's a little bit hard to make sense of.

So let's take another view. OK, now we're using a method called UMAP that is designed to take high-dimensional space. And instead of using a linear projection, it's using a non-linear projection that has many complicated properties, which we'll talk about in a moment. But the basic idea is that one thing it does tend to preserve is clustering.

And so one thing that I can do is, since this is our data set and we have labels, I can label, I can color these digits by their label. And that lets us easily see, for example, that in general, all the 1's are in a certain area, all the 4's are in another area, and so forth. But we can also see it doesn't completely get it right. Like it thinks this 3 it clearly thinks belongs with the 2's, this 8 looks like a 2, and so forth.

Already, I think this view can tell you a bunch of interesting things. One thing is just that apparently these digits are actually pretty well-separated. Like you can see why this is not the hardest machine learning task in the world, that there is just a natural clustering structure that is, you know, even a visualization method brings it out pretty well. One other thing that's kind of nice, though, is that if you look even within these clusters, you can start to see some interesting structure. For example, in these 1's, you'll notice that they're kind of tilted to the left on this side, they're tilted to the right on this side, and you really can start to get a sense of what the geometry looks like in this high-dimensional space.

I will say that among the educational materials that we've been putting out are sort of step-by-step interactive essays about how to interpret these complicated nonlinear projections, because they're amazing, but weird. And the weirdness can be your friend, but you have to have And sometimes problematic, so you have to be aware of that.

OK, so another kind of high-dimensional object which turns out to be very, very useful is something called a word embedding. So word embedding, it's sort of a weird terminology, but what it just means is you're mapping words into points in high-dimensional space, much the way we mapped those images. However, with images, there was a pretty straightforward way of doing it by using pixels. With words, it's actually sort of complicated, and I won't get into exactly how that is done.

The idea is usually to make sure that related words end up near each other. And in fact, it turns out you can do this. However, when people figured out ways to make that happen, they discovered a really interesting and surprising fact, which is not only can you get a situation where similar words are near each other, but almost as a byproduct, like seemingly as an accident, you get meaningful directions in space. And so the image that you're seeing here is a figure from one very famous early paper in which they showed that there is a direction in space that corresponds to the relationship of a capital to the country it is a capital of, which is really amazing.

OK, so let's take a look, just because we cover a visualization up, of what those word embeddings look like. And I'm going to show this in a couple of ways. So this is just a raw PCA view. This is a UMAP view. And let's actually play with this a little bit. Let us explore this, because I think it really gives a sense of you're seeing the English language here. This is the geometry of English.

And what can we say about it? Well, there's obviously clumps. Let's go spelunking a little bit. What is this clump here? Oh, look, a bunch of names. Seems like largely maybe boys' names here. And then, let's see, we get some other names. It's hard for people. Yeah, I'll read them. Lloyd, Alex, Matt, Joel. What about this cluster here? Horns, percussion, guitar, quartet, bass. We found the music cluster over there. What about these? Oh, look, we have some numbers. And whoops, let's see if we can zoom in on them. 0, 1, 2, 3. We can actually see it. In fact, if we zoomed in more, we can almost see a little bit of a linear structure representing those.

We have some other things here. Let's see, infinity, angle, math, BF. Look, we found the tech world here. So there really is everything. And this gives you a sense of just how much information this is encoded here.

But now I want to show one more thing. Let's go back to, let's do a custom embedding. And the way this works is we actually can define an axis in terms of one word to another word. Right now, it's just a 2D projection. It's linear. And both axes are random. But let's actually search for a word and look at some neighbors. Let's see, what's a good one? How about science?

Click on this. We'll isolate those 100 points. And now we can sort of see the neighbors of the word science in PCA. But now let's actually do a custom thing. And we can play this little game where, say, we look at, let's look at man on the left and woman on the right. And suddenly, we see something really interesting, that now the left-to-right, that x-axis, left-to-right, has things that it's thinking are closer to the word man versus woman. And this is actually a little bit sad, that you can sort of see, like, technical is its most male word here in this neighborhood.

And meanwhile, on the female side, although we do have scientists, we see religion, author, stories, psychology. And this is actually a sobering moment. When people started to realize this about these embeddings, I think they just realized how serious things had become, that these things had been trained on a huge corpora of existing words and language. And those had contained a bunch of biases. And this new word embedding was very efficiently encoding some of those.

And this is actually a sobering moment. When people started to realize this about these embeddings, I think they just realized how serious things had become, that these things had been trained on a huge corpora of existing words and language. And those had contained a bunch of biases. And this new word embedding was very efficiently encoding some of those.

So in fact, we can show a couple of other examples. This is the neighborhood of engineer, shows many of the similar things. If you do a neighborhood of math. I want to read it out loud. Yeah, you can see geometry, computational on the man side. Again, psychology, art, library on the woman's side. And it's not even just sort of the big famous biases, like gender or something. But even something like old and new, and not with people, you can sort of see interesting things that are meaningful. Like if I put in book and look at its neighbors, I see for old, I see things like poem and manuscript. On the new side, I see like company, for instance, magic. And you can start to see it's picking up on important associations, but in a way that is potentially problematic.

TCAV: interpreting neural networks

So we have this thing that's powerful. It requires a lot of analysis. But let's think about how we could use this idea of meaningful directions in other ways, now that we have it. So one way, it turns out, is to interpret what's going on inside of neural networks so that they're not, in fact, black boxes. And I'd like to very quickly fly through a method called TCAV, published in ICML 2018.

And the idea behind this method is to take this insight that, OK, if things like word embeddings have directions that sort of correspond to things that are meaningful to humans, like capital cities to capitals, maybe there are other types of embeddings that have that same property. And we could use that to our advantage.

So I'll illustrate this by imagining a question you might ask of an image classifier. Let's say it looks at a picture of a zebra. And you wonder, why did it think that thing was a zebra? So there's sort of a bad answer to this question and a good answer. The bad answer is if it told you something like, oh, it was this particular pixel, or these sets of pixels. Those are not terms that a human can understand. A much better answer would be if it could say something like, oh, it was striped, and that helped.

Of course, it's very hard to kind of figure out how to extract that answer. However, there is a way to do it. So let's follow what happens if you have a neural network that is recognizing an image, like zebra. So you put in an image. Let's say it's an image of a zebra. And then you put it into your neural network. And what is shown here is a little cartoon of one famous network called Inception.

And the idea is that in this kind of network, the image gets sort of transformed layer by layer. And you can think of each layer as performing a transformation of Euclidean space. And so if you put something like zebra through it, show it an image, in each of these layers, you can think of the state of the layers as a set of activations of the neurons. And those activations form a vector in high-dimensional space. So you can see how this starts to relate to what we were just looking at.

And then what's interesting is if you put a whole bunch of images through, you get a bunch of points in this high-dimensional space. And you can start playing this game with meaningful directions. So let's show how this works. So the idea is that let's say there's a concept you're interested in, for example, stripes or the presence of stripes. You don't even have to try to define to the computer in language what stripes is. You could just show it a bunch of examples of striped images.

And then you put those striped images through your neural network. And then as you see. And one important thing, these are not images of zebras. This is just any stripes, anywhere. We've done this literally with striped ties, and striped shirts, and abstract stripes. That's a good point. And then what you'll get is a kind of cluster of images in this space of activations in an intermediate layer that represents stripes. And then you can also put a bunch of random other images through and train a classifier to distinguish between the stripes and the non-stripes. And that gives you what's called a concept activation vector, a direction in space that represents that concept.

And then this is what's cool, is that you can start calculating with that. That vector is something you can, for example, take directional derivatives to understand the sensitivity to this concept. So computers like it. But humans like it too, because it's a human-friendly concept of stripes that they're getting an answer in terms of. So this turns out to actually be a very useful technique in general. There's a lot more we can say. But what's really interesting is that you can go from just interpreting or understanding directly to control.

Applying TCAV to cancer pathology

Yeah, so imagine that. You start with this super high dimensional space. And you find meaning in that space, meaning that is also meaningful to humans. So what I want to show you next is an example of a project where we used that for health care.

So here's the scenario. We were working with pathologists trying to diagnose cancer. And the way they work is they will look at a slide of a patient, very much like the large image you see here. And then they will very, very closely analyze this image and try to understand, one, do I believe is there cancer here? And if so, what grade of cancer? This is what's called Gleason 1, Gleason 2, Gleason 4, OK?

And so even though there is an extensive set of work that has been done with optimizing algorithmic performance to try to look at an image like this and say, yes, there is cancer, no, there is no cancer, much less attention had been paid to actually what kind of information and what kind of interactions would doctors like to get out from a dialogue with these systems? Instead of just being given a number and a probability, do they actually want to interact and direct these systems? It turns out that, yes, they would like to do this.

And part of the reason why they want to do stuff like this is because many times for things like sometimes there are artifacts in these images and they want to say, dismiss that. I don't want you to pay attention to that thing. That's not important. Or there are anatomical features that they care deeply about, things like the density of glands or whatever other concept. And think about it. Those are high level concepts, right? How do you tell the machine what I really care about or how dense the glands are here?

Basically what we did is we created a set of tools that allowed pathologists to very quickly navigate these highly massive data sets. Because again, what they're trying to do, they're looking at that slide and they are trying to get the system to bring to them the nearest neighbors, right? What is the patient I've seen before that has been diagnosed? What is the closest case to this case I'm seeing now?

But that's not a one-shot thing. That is the thing that's interesting. It's not just like automate that. So one of the things is just allowing people to cut little, to say, focus here. That's what I want to do. Things like these have been done before, right? Or once you bring me a bunch of cases that are like the case I just highlighted for you, now let me go deeper into one of those cases and bring me more closest neighbors to those cases. So in other words, let me navigate by example.

The thing that's new that we can do is now with things like CAVs that you just heard about, we can actually create much more sophisticated ways of navigating these systems in medical concepts. So working with the pathologists, they told us that a medical concept they deeply care about is something called stroma. I'm not a doctor. I had no idea what this was, but for them, it was super interesting. So we created a slider where they could say, show me cases like this, but with more stroma or with less stroma or fused glands. That's another medical concept they care deeply about.

And so because again, these ML systems inside these highly massive spaces, there are meaningful directions. We were able to map these medical concepts to these meaningful directions in the system. And that gave doctors a much more powerful way of interacting with the systems. And one of the really important things here is to actually for the doctor to calibrate their trust. Should I trust this? Should I not trust this? Or even to learn how the system works. How does it behave? How does it think about edge cases?

And so we actually had a user study. In this case was with 12 pathologists looking both at a traditional machine learning system that just gives you a prediction with a confidence level, or this system called Smiley, where you have all these different tools that you can interact with. And the results were very positive. Basically, all 12 pathologists preferred Smiley. The level of trust was significantly higher. And even though you would think, if you could only automate my work for me, I'd be done. They actually found a lot of use in going back and forth. But not only that, steering the system in ways that they cared about.

Data scientists as user experience designers

OK, so yeah, I think what you see from this is sort of the ability of statistical analysis