Although I started this blog mostly to complain about the media screwing up coverage of AI, sometimes they get it right. And The Verge has really been getting it right recently. And no, I’m not being paid off by them. However, if anyone from The Verge is reading this: I accept cash, Bitcoin, Dogecoin, and freshly-delivered Freebird burritos.
A friend sent me this link to a recent article talking about the limits of natural language processing and why it is not “better than humans.”
The article is really clear and correct, so you should really read it, but I’ll summarize it and add my own points.
- Despite the fact that the high level ideas that researchers are studying sound impressive, in this case reading comprehension, most of the time researchers are just measuring on benchmark datasets. These are very narrowly defined tasks with a set of examples that we train our models on. Good performance of a machine learning model on a dataset such as the Stanford Question Answering Dataset (or SQuAD), as the researchers explain in the article, does not mean that the model can do well in general on reading comprehension.
- ML models are often very brittle. In the case of SQuAD, they are learning very good pattern matching. When they see an example that breaks out of the example, they tend to give bad results. For example, when asked who the quarterback who was #38 in Super Bowl XXXIII, if the model is given a paragraph that talks about two different quarterbacks, it will often guess the wrong quarterback. Rather than reading and understanding the paragraph, as we think about humans when they perform this task, the model is mostly just looking for a name that matches closely to the word “quarterback.”
Here’s an illustration of the SQuAD dataset from the original paper so you can get a basic idea what the dataset is asking:
There’s a lot of great stuff in there, so you should read the article in full. But I want to talk about the journalistic aspect of the article as well.
The article does a lot of things right that I do not see often in the sort of typical click-bait articles about AI.
- Most importantly, they talk to actual researchers in the field who know a lot about the subject matter. In this case, they interview Yoav Goldberg, who is a PhD and lecturer at Bar Ilan University and was a Post-doc at Google Research and Pranav Rajpurkar, a PhD student at Stanford and first author on the SQuAD dataset paper.
- As I plan on writing when I get around to writing my story-writing guide for AI, the best thing you can do is to talk to at least one person who is most familiar with what you’re writing about, preferably the author of the paper you are discussing, and at least one other researcher who is also knowledgeable, but is not on the paper or affiliated with the first source. This article does exactly that. A+
- They avoid an over-sensationalized headline. They in fact do the opposite. The say right in the headline and subhead that AI systems are not actually better than humans and have a long way to go. A lot of stories about AI (and pretty much every subject really) make a bold attention-grabbing claim and then (if you’re lucky) explain in the actual piece explain how that statement was a gross exaggeration. This practice drives me up the wall and I am so glad they don’t stoop to it.
- The content itself is technically correct (as far as I can tell), without being overloaded on technical terms. That is not easy to pull off.
So well done. Hopefully we’re just on the verge of better AI journalism.
I’ll see myself out.