7QT: The Odder Errors of Computers and Humans

7QT: The Odder Errors of Computers and Humans August 14, 2015

— 1 —

My family tends to share thoughts on the New Yorker caption contest every week, over email, when the magazine arrives (no wins yet!). But I didn’t know that our entries might be being winnowed out by a computer.  BloombergBusiness has a nice feature on the Microsoft team working with the New Yorker cartoon editor to see if they can build a program that can tell the difference between funny captions and ones that fall flat.

For the study, Shahaf fed cartoons and captions from the New Yorker’s database into the system and trained it to find the funniest choices among captions that make similar jokes. She relied partly on crowdsourced input from contract workers, using Amazon.com’s Mechanical Turk. Then she moved to the harder task of ranking jokes. Because typical computer vision software is designed for photos, not drawings, the researchers had to manually describe what was pictured in each cartoon. They organized this into two categories: the context and the anomalies.

No word on whether it would have passed or failed the cartoon from Seinfeld

— 2 —

Nautilus has an interesting piece on what we might be able to learn from the classification mistakes that computers make.  It’s one thing when a single program makes a mistake (e.g. classifying a picture of wavy lines as a starfish), but something else when programs developed independently wind up all erring in the same way.  It suggests there’s some kind of underlying way of modeling the world that these diverse programs have converged on that we don’t quite understand.

Such screwy results can’t be explained away as hiccups in individual computer systems, because examples that send one system off its rails will do the same to another. After he read “Deep Neural Networks Are Easily Fooled,” Dileep George, cofounder of the AI research firm Vicarious, was curious to see how a different neural net would respond. On his iPhone, he happened to have a now-discontinued app called Spotter, a neural net that identifies objects. He pointed it at the wavy lines that Clune’s network had called a starfish. “The phone says it’s a starfish,” George says.

Spotter was examining a photo that differed from the original in many ways: George’s picture was taken under different lighting conditions and at a different angle, and included some pixels in the surrounding paper that weren’t part of the example itself. Yet the neural net produced the same extraterrestrial-sounding interpretation. “That was pretty interesting,” George says. “It means this finding is pretty robust.”

In fact, the researchers involved in the “starfish” and “ostrich” papers made sure their fooling images succeeded with more than one system. “An example generated for one model is often misclassified by other models, even when they have different architectures,” or were using different data sets, wrote Christian Szegedy, of Google, and his colleagues. “It means that these neural networks all kind of agree what a school bus looks like,” Clune says. “And what they think a school bus looks like includes many things that no person would say is a school bus. That really surprised a lot of people.”

— 3 —

Of course, it’s not just computers that make choices that are baffling from the outside.  I do like the “because it is there” spirit that led people to figure out a way to play Fallout 3, an action roleplaying video game, as a baby.

YouTuber Bryan Pierre just managed to complete a playthrough in which he starts off as, and finishes, Fallout 3 as a baby. For those of you that have never seen this before, basically, there’s a part of the game in the opening where you are briefly a toddler. During this segment, you can read a baby book and assign yourself your SPECIAL stats. Eventually your dad comes into the room, and the game flashes forward a few years to your birthday. Clever players figured out that if you beeline to the door where your father comes in during this scene, you can actually leave the playpen area and remain a child. And that’s exactly what Pierre did here

While the entire run takes hours, it’s quite an educational delight that gives us a glimpse at what makes Fallout 3 tick. Playing as a baby means moving at a snail’s pace—you are, after all, “crawling” on the floor. “You are at an extreme disadvantage,” Pierre notes, mostly because when an enemy sees you, you can’t really run away. Other quirks that come with this run include a messed-up PipBoy function (Pierre has to use a mod to utilize it), as well as the inability to swim normally. You also lose the ability to interact with certain objects, which are unavailable to children. In short, playing as a baby is kind of a nuisance…but there are someadvantages. For instance, playing as a baby means that your hitbox is hilariously small, and so sometimes certain enemies have difficulty harming you. On the other hand, being a tiny baby does mean that even simple creatures like mole rats seem like giant monsters.

I’m really so delighted that people do this.

— 4 —

Mathematician Terry Tao has a great essay up on how to catch a particular kind of mistake — the moments when you stumble on a too-easy solution:

If you unexpectedly find a problem solving itself almost effortlessly, and you can’t quite see why, you should try to analyse your solution more sceptically.

In particular, the method may also be able to prove much stronger statements which are known to be false, which would imply that there is a flaw in the method.

In a related spirit, if you are trying to prove some ambitious claim, you might try to first look for a counterexample; either you find one, which saves you a lot of time and may well be publishable in its own right, or else you encounter some obstruction, which should give some clue as to what one has to do in order to establish the claim positively (in particular, it can “identify the enemy” that has to be neutralised in order to conclude the proof).

Actually, it’s not a bad idea to apply this type of scepticism to other mathematician’s claims also; if nothing else, they can give you a sense of why that claim is true and how powerful it is.

I really like the “check to see if you can use your breakthrough to prove something false” error check.

 

— 5 —

Speaking of checking what you think you know, Ben Casselman, one of my coworkers at FiveThirtyEight, has a stellar feature on what happened in Canada when they made parts of their census optional (as some U.S. politicians have suggested we do).

Some disclosure is probably in order here: My FiveThirtyEight colleagues and I aren’t exactly neutral observers when it comes to the value of the American Community Survey. When I reported on the economic consequences of different college majors last year, that analysis was based on ACS data. So was my story from June on where people are killed by police. So was Mona Chalabi’s analysis of the most cosmopolitan metropolises, Leah Libresco’s story on domestic partnerships, and Jia Zhang’s fantastic censusAmericans Twitter bot.

But government surveys are good for a lot more than data journalism and Twitter bots. They are used to evaluate the effectiveness of government programs, guide decisions on school construction and infrastructure upgrades, and make business investment decisions. When researchers in San Diego wanted to study the impact of the city’s proposed minimum-wage hike, they turned to ACS data. So did Boston when it wanted to assess the progress of its programs to encourage cycling, and Cincinnati when itapplied for a grant to build a new streetcar system.

Canada, like the U.S., conducts a regular census (every five years compared with every 10 years in the U.S.) that attempts to count every resident. The full census collects only very limited information such as age, sex and marital status. Until 2011, Canada supplemented that data with a broader, mandatory survey that was known as the long-form census and was sent to roughly 1 in 5 households along with the standard census form. But in 2010, Canada’s Conservative government decided to eliminate the long-form census and replace it with a new survey that covered the same basic topics but was conducted separately from the census and was voluntary to complete.

[…]

“The replacement survey is a complete waste of money because the data are simply not reliable,” said Munir Sheikh, who resigned in protest from his position as chief statistician of Canada in 2010 following the government’s decision to make the survey voluntary. “The quality of the data that the census collects obviously is not good enough to be used as census data. That is the conclusion of any researcher who has looked at it.”

— 6 —

Census data is the kind of government-organized data-gathering that I’m very in favor of.  I’m less pro-mass video surveillance, just like the artists who decorated England’s ubiquitous cameras with little party hats for Orwell’s birthday:

camera hat
(Front 404)

— 7 —

But if you want a really bleak reflection-through-surrealism of the state of our civil liberties, you might want to check out China Miéville’s new story collection, Three Moments of an Explosion.  Here’s how the NYT describes one of his stories:

A chilling story called “The 9th Technique” — the title is a reference to the 10 techniques laid out as acceptable forms of interrogation in a United States government memo on torture in 2002 — imagines that a black market has arisen in objects that were used to question suspects during the Iraq war. (The most precious is the cloth used in the first recorded waterboarding.) Somehow these artifacts have acquired magical powers that can be animated if the conjurer recites the memo.

“You do not list 10 techniques, numbered and chantable, in austere prose appropriate for some early-millennium rebooted Book of Thoth, and not know that you have written an incantation,” Mr. Miéville writes.

For more Quick Takes, visit Conversion Diary!


Browse Our Archives