Experimental psychology has come in for a rough time in the last few years. Initially, back in 2015, a meta-study cam out that supposedly showed a replication crisis in psychology and that many studies over the years were not as strong as claimed. This caused a right stir. And then came a study of that study that showed that the meta-analysis was in fact very poorly done and that the initial studies probably were alright:
Remember that study that found that most psychology studies were wrong? Yeah, that study was wrong. That’s the conclusion of four researchers who recently interrogated the methods of that study, which itself interrogated the methods of 100 psychology studies to find that very few could be replicated. (Whoa.) Their damning commentary will be published Friday in the journal Science. (The scientific body that publishes the journal sent Slate an early copy.)
In case you missed the hullabaloo: A key feature of the scientific method is that scientific results should be reproducible—that is, if you run an experiment again, you should get the same results. If you don’t, you’ve got a problem. And a problem is exactly what 270 scientists found last August, when they decided to try to reproduce 100 peer-reviewed journal studies in the field of social psychology. Only around 39 percent of the reproduced studies, they found, came up with similar results to the originals.
That meta-analysis, published in Science by a group called the Open Science Collaboration, led to mass hand-wringing over the “replicability crisis” in psychology. (It wasn’t the first time that the field has faced such criticism, as Michelle N. Meyer and Christopher Chabris have reported in Slate, but this particular study was a doozy.)
Now this new commentary, from Harvard’s Gary King and Daniel Gilbert and the University of Virginia’s Timothy Wilson, finds that the OSC study was bogus—for a dazzling array of reasons.
Well, it appears that the crisis is not yet over. Studies such as the hugely influential Stanford Prison Experiment and the Marshmallow Experiment have supposedly, again, been called into question. I, for one, find the latter to be intriguing because I am under the impression that the marshmallow experiment has been replicated a large number of times, and measuring for different things (from SAT scores to Body Mass Index and so on).
In the case of the Stanford Prison Experiment, it was apparently based, to some degree, on deceit:
But its findings were wrong. Very wrong. And not just due to its questionable ethics or lack of concrete data — but because of deceit.
A new exposé published by Medium based on previously unpublished recordings of Philip Zimbardo, the Stanford psychologist who ran the study, and interviews with his participants, offers convincing evidence that the guards in the experiment were coached to be cruel. It also shows that the experiment’s most memorable moment — of a prisoner descending into a screaming fit, proclaiming, “I’m burning up inside!” — was the result of the prisoner acting. “I took it as a kind of an improv exercise,” one of the guards told reporter Ben Blum. “I believed that I was doing what the researchers wanted me to do.”
The findings have long been subject to scrutiny — many think of them as more of a dramatic demonstration, a sort-of academic reality show, than a serious bit of science. But these new revelations incited an immediate response. “We must stop celebrating this work,” personality psychologist Simine Vazire tweeted, in response to the article. “It’s anti-scientific. Get it out of textbooks.” Many other psychologists have expressed similar sentiments.
Many of the classic show-stopping experiments in psychology have lately turned out to be wrong, fraudulent, or outdated. And in recent years, social scientists have begun to reckon with the truth that their old work needs a redo, the “replication crisis.” But there’s been a lag — in the popular consciousness and in how psychology is taught by teachers and textbooks. It’s time to catch up.
The Guardian recently ran an interesting piece on how the Robbers Cave experiment from the 1950s was a “do-over from a failed previous version of an experiment, which the scientists never mentioned in an academic paper”. Next one on the list is surely the Milgram Experiment…:
And it seems like Milgram’s conclusions may hold up: In a recent study, many people found demands from an authority figure to be a compelling reason to shock another. However, it’s possible, due to something known as the file-drawer effect, that failed replications of the Milgram experiment have not been published. Replication attempts at the Stanford prison study, on the other hand, have been a mess.
In science, too often, the first demonstration of an idea becomes the lasting one — in both pop culture and academia. But this isn’t how science is supposed to work at all!
For example, let’s look at the Marshmallow Test, one that I often refer to in my talks on free will. As Vox reports:
While successes at the marshmallow test at age 4 did predict achievement at age 15, the size of the correlation was half that of the original paper. And the correlation almost vanished when Watts and his colleagues controlled for factors like family background and intelligence.
That means “if you have two kids who have the same background environment, they get the same kind of parenting, they are the same ethnicity, same gender, they have a similar home environment, they have similar early cognitive ability,” Watts says. “Then if one of them is able to delay gratification, and the other one isn’t, does that matter? Our study says, ‘Eh, probably not.’”
In other words: Delay of gratification is not a unique lever to pull to positively influence other aspects of a person’s life. It’s a consequence of bigger-picture, harder-to-change components of a person, like their intelligence and environment they live in.
The results imply that if you can teach a kid to delay gratification, it won’t necessarily lead to benefits later on. Their background characteristics have already put them on that path.
What’s more, the study found no correlation — even without controls — between delaying gratification and behavioral outcomes later in life. “In that sense, that’s the one piece of the paper that’s really a failure to replicate,” Watts says.
His paper also found something that they still can’t make sense of. Most of the predictive power of the marshmallow test can be accounted for kids just making it 20 seconds before they decide to eat the treat. “So being able to wait for two minutes, five minutes, or seven minutes, the max, it didn’t really have any additional benefits over being able to wait for 20 seconds.”
That makes it hard to imagine the kids are engaging in some sort of complex cognitive trick to stay patient, and that the test is revealing something deep and lasting about their potential in life. And perhaps it’s an indication that the marshmallow experiment is not a great test of delay of gratification or some other underlying measure of self-control.
Their study doesn’t completely reverse the finding of the original marshmallow paper. But it reduces the findings to a point where it’s right to wonder if they have any practical meaning.
Being a skeptic is tough work. You can never rest on your laurels. I’m going to have to look in, more depth at these and other psychological experiments that I have for so long taken for granted.
Bad ideas do persist in psychology, that is for sure. It’s why I edited Caleb Lack’s fascinating Psychology Gone Astray: A Selection of Racist & Sexist Literature from Early Psychological Research that looked at the biases in early psychology and the destructive power of preconceived ideals.