Tracking progress in AI is really hard

A lot of the work I’ve done for MIRI has involved trying to track progress in artificial intelligence. Not so-called “general intelligence,” but specific things ranging from playing games like chess and Scrabble to recognizing faces and objects to understanding and translating human language. And it turns out that figuring out how much progress there’s been in these areas is really hard.

It’s tempting to blame lack of effort, and say we should do more to try to track AI. But that’s arguably unfair. If you’re an academic and you want to work on some particular AI application, you can get a paper published just collaborating with a few other people and doing something moderately innovative, and that’s all there is too it.

But if you actually want to put together a project to measure progress in AI, that’s much harder. First, you need some standardized way to measure performance on whatever task you’re interested in. Then you need to convince a bunch of other people to test their software with your measure. Then you need to repeat the process several years in a row.

When you look at it that way, it’s not so surprising that this doesn’t happen all that often. Mostly, you need the government to step in, and sometimes it does: the National Institute of Standards and Technology has done some good here, as has the Department of Defense through things like the DARPA Grand Challenges (for driverless cars).

Examples of academics organizing such things on their own are harder to come by; the PASCAL VOC for object recognition is the main example I’ve found–and the PASCAL VOC is no more, because, tragically, event organizer Mark Everingham killed himself last year.

Furthermore, few of these examples are really ideal for tracking progress in AI, because they don’t typically repeat things the same exact test procedure year after year. Rather, they tweak things, which is understandable if you have other goals than charting year-to-year progress. But it does complicate things or looking over a series of tests and trying to tell how much progress was made.

Even if the test procedure is kept the same for several years running, often at some point this will stop making sense. Some of the tests on facial recognition run by the NIST eventually showed very close to 100% accuracy. One accuracy reaches that level, it doesn’t make sense to keep re-running the same tests. But when you design new tests to cover a broader range of test conditions, how do you compare the new results to the old ones?

A final issue is that a lot of the really interesting AI research going on right now is being conducted by big corporations, most notably Google. For all we know, those companies may put a lot of work into charting progress on various applications internally, but for obvious reasons, they have little incentive to share that date with the public, especially since that would mean their competitors getting access to it.

So it’s a hard problem… and a frustrating situation, for anyone trying to track progress in AI and project those trends into the future. All I can conclude is that we don’t really know how fast AI is progressing and therefore uncertain when computers will surpass humans in key areas.

But uncertainty about when AI will come doesn’t justify acting as if it’s very far off. What worries me most is that I can’t rule out the possibility that AI will come much sooner than we have any hope of being ready for it.

"Atomsk - Yes, I think the way I feel about it is normal. I think ..."

Let’s talk about violent pornography
"The Scientific Method works by testing a hypothesis for implications, contradictions, and ridiculous/false results. You ..."

Pulling some devastating punches: a review ..."
"A bit OT: Found this article and it is imo closely related to the issue ..."

Let’s talk about violent pornography
"Just one thing for now, because it takes quite a bit of time to think ..."

Let’s talk about violent pornography

Browse Our Archives

What Are Your Thoughts?leave a comment