Analyzing a TidyTuesday dataset with Posit Assistant in RStudio | Hadley Wickham

So I'd say, like, I think parquet files are superior to CSV files in just about every possible way. Except for one, and that's you can't open it in a text editor and look.

Also great if you're collaborating with anyone using any other language. Every language has tools for reading parquet files. Python, you can easily stick them into a dark DB database.

And so whenever somebody asks me like, do I still need to know how to code? I always say yes, because even if you are using an LLM, you have to be, like you're training an intern right now. You're training like a wildly talented intern who is just like super unhinged and chaotic and like you still have to look at everything that they do.

Do you still need to know how to code in 2026? Oh, yeah.

Just on the unhinged intern, like yes, this is an unhinged intern type thing. This is not literally the smoking gun. This is showing that there are lots more.

So it's showing that there are lots more dairy cattle, but nowhere near enough for that to be explanatory. In the context of millions of animals.

Shiny apps for exploratory analysis

Like I looked at this, the previous TidyTuesday, which is the astronomy picture of the day. And one of the things I did with it is like a clustering, a text clustering on the image descriptions. And like, I could have figured this out eventually, but just getting something that really, really quickly giving me the results without me having to remember all the details of like how do I use TidyText to do text clustering. That was like super, super appealing.

Like it chooses the number of clusters. And as a reminder, this is a couple of weeks ago now. But like, so when you do cluster analysis, right? You have to choose the number of clusters and there's a bunch of heuristics. But I think whenever you do cluster analysis, you want to do a little bit of experimenting. Like is my, are my results like robust to different numbers of clusters? And so what I did, I was like, okay, well, well, like make me a Shiny app that lets me explore that.

This is one of my favorite uses for Shiny apps, y'all. I make apps all the time that let me step through something and look at it. Like one time I was doing a little project where I was classifying blue sky posts. And so I was like, I'm going to pre-classify these with an LLM and then I want to build a Shiny app and have the Shiny app feed me each one and tell me like what it had assigned it. And then I can decide whether or not it was something. I was trying to classify replies versus posts, basically just really basic stuff, just to see if I could do it. It was amazing.

And like the ability to write like a Shiny app like this in like, you know, two or three minutes, this is really cool. Like, yes, I could have, I've, you know, I could have written this Shiny app. I probably never would have before because I would have felt like the cost being a trade-off like wasn't there. But now like, just give it a go. And with an LLM, like it's not a super complicated Shiny app and that's fine. And it's a throwaway, so it doesn't have to be pretty.

Do you still need to learn to code?

And that reminds me of a question we got to earlier. Like, does it still make sense to learn how to code in 2026?

Like, yes, I think understanding code is still really important. I do think the trade-off between the ability to read code and ability to write code has changed because now we can like generate so much code quickly with an LLM. Like your ability to read that code quickly, figure out, you know, understand what it's doing and if it's the right thing and then either like make small tweaks by hand or tell the LLM to fix it. But if you don't know what it's doing, that just seems like such a dangerous place to be in.

Like if you don't have some sense of like, you know, you still, to do a good cluster analysis, you still need to understand like, oh, actually the number of clusters is really important. And, you know, we, there's no magic way to figure out the correct number of clusters. So you have to know, like the LLM will just give you some decent answer. You need to go to and like interrogate that further. So still that like subject matter knowledge, that ability to read code, that ability to like go in.

You know, like you still want to create these data sets and the ability to like go in there and actually modify them directly is super important because, you know, being able to type a one instead of a two there, like that's much lower effort than saying to the LLM, hey, can you please go to this specific plot and do make this change for me? Like if you know exactly, like one of the advantages of code is that it's like, you know, it's this precise language and LLMs are great at going from your vague human language to this precise code language. But if you already know, if you already have that precision in mind, trying to tell an LLM what to do in human language to make a precise change, that's just like, that's inefficient.

So I think like code is still important, but your ability to write code, I think is less important than it used to be. Your ability to read code is still more important and your ability to like ask good questions, like even, even more important.

I will say we have two minutes left and I would love to end with just like a little, one more philosophical question from Ben, cause I've been thinking about this a lot, which is if one of the best ways to learn how to read code is to write code, what's gonna happen if we're writing it less?

Yeah, I don't know. Like one of the things, like we're gonna have to learn new ways, to make new tools to, you know, new ways of doing this. I think one of the things that's always sort of intrigued me, I read this article, I don't know, like maybe 10 years ago about masters of fine arts in programming. And like, I don't know, I can't remember if it was like a real program or just kind of a speculation, but this idea like, you know, when you go and become an artist, you spend a bunch of time, obviously you do spend a bunch of time like making art, but also like looking at the work of like old masters and attempting to copy that by hand. And of course you could take a photograph of a famous painting that's gonna give you like an exact reproduction, but that recreating it by hand, you know, the things you do in like art and design programs where you like collectively talk through things, I think all of these skills, like we're gonna have to figure out how to apply them to the data science and programming today, like super duper important.

So I think like code is still important, but your ability to write code, I think is less important than it used to be. Your ability to read code is still more important and your ability to like ask good questions, like even, even more important.

All right, wonderful. Well, we have literally one minute left, so I will say let's wrap up and everybody say thank you to Hadley for joining us. This was so much fun.

If you wanna sign up for the waitlist for Posit AI stuff, Nick has put that in the chat actually, posit.co slash products slash AI. You can sign up for the waitlist for this private beta for the tool that Hadley is using.

That is, it's currently a free beta because we're trying to understand what's helpful for people. And so it is free. So if you do wanna learn a little bit of like AI-powered data analysis with like no money down, at least for a little while, like we can't do this forever because we are a business, but at least if you wanna get your toes wet in AI-supported, empowered data science, like Posit AI is a great way to do that right now.

All right, fantastic. And I will see everybody on Thursday at the Data Science Hangout. I will also see you next week at RainbowRConf where I am going to be with Domi Pak doing a trivia thing. If you are attending RainbowRConf, I'll see you there. Bye everybody. See you on Thursday and see you next week. Bye.