Guest post by Robert Woodberry
“Religious Children are Meaner than Their Secular Counterparts” proclaimed a headline in the Guardian. “Religious Kids are Jerks” raved the Daily Beast. Hundreds of other newspapers and blogs touted similar articles: the Economist, Forbes, Good Housekeeping, the LA Times, The Independent. All these articles were based on a 4 ½ page research note in Current Biology by University of Chicago professor Jean Decety and six other scholars.
But what is the evidence behind these claims? Does it match previous research? Is it worth all the hype? My analysis of the article demonstrates that the project was poorly constructed and the data analysis sloppy. The authors do virtually nothing to test alternative explanations or mitigate the flaws in their research design. The results contradict the vast majority of other research on the topic. The authors extrapolate well beyond what the data show, and reporters extrapolate beyond even what the authors claim. However, to make the problems clear for those not trained in statistics and not familiar with the previous research on the topic takes some space.
The authors ran an experiment with 1,170 children in six countries (Canada, China, Jordon, Turkey, USA, and South Africa). In the main experiment, the authors gave each child 30 stickers and allowed them to pick the 10 they wanted to keep. Researchers then told the child that they lacked time to run the experiment with other children in their school, but if the child gave up some of their 10 chosen stickers, the researchers would give those stickers to another child. The researchers then counted the number of stickers each student gave back as a measure of how “altruistic” the children were. The researchers then interviewed a parent of each child and asked the parent an open ended question about the parent’s religion. The researchers decided whether or not they considered the parent religious, and then applied their religious designation to the child. The child was not asked about their own religiousness. The researchers then compared the number of stickers given away by “religious” and “non-religious” children and found that on average “non-religious” children gave away more stickers (or actually 86 percent of a sticker more). Yes, the global media campaign is about a fraction of a sticker. Despite the huge diversity of people in their cross-national sample (e.g., Canada and Jordan) and the many factors that influence the generosity of children in such diverse contexts (e.g., poverty), the researchers assumed that the only difference between the “religious” and “non-religious” children was being religious.
In the second experiment the researchers showed the children a series of scenarios involving one child pushing another and other types of “interpersonal harm”. The religious students judged the behaviors as more “mean” than the non-religious children. Muslim children recommended a harsher punishment for the bad behavior than non-religious children. The punishments Christian children recommend were indistinguishable from non-religious children.
However, in the conclusion of the article and in most media reports the various authors claim that “religious” children were “meaner,” “harsher,” or “more vindictive” without qualifying that only Muslim children were (if we assume the problematic sample applies to Muslim children in general). The researchers did not interpret the religious children’s concern for people who were pushed or hurt by another child as a sign of altruism, but as vindictiveness. The authors do not adjust their evaluations of the severity of the punishments to account for the children’s interpretation of the severity of the offences – if you don’t think pushing someone is bad, you obviously won’t want a strong punishment for it. Nor do they interpret the Christians as merciful for thinking pushing or hitting another students was meaner than non-religious students thought, but calling for equally mild punishments as non-religious students.
In the third experiment, researchers asked parents how empathetic their child is and how sensitive to injustice. Religious parents rated their children as more empathetic and sensitive to injustice than non-religious parents. The researchers interpret this as parental blindness – assuming the sticker experiment better captures the empathy and sensitivity of the children than either the child’s concern for children who are shoved, or a life time of parental experience. Alternatively the researchers could have asked a teacher or other students about the empathy and sensitivity of the children (as outside corroboration), but they did not.
Both the first and third experiments are robust to controls for age, country, and a rough measure of mother’s education. The effect of religion on sticker giving presumably becomes smaller with these controls, but it is hard to determine how much because the authors switch to standardized coefficients without providing standard deviations. Thus, we do not know what a standard deviation change in religious identity has a -.15 standard deviation change in sticker-giving means in terms of stickers and cannot translate the coefficients back into an understandable unit. Previously religious children gave 86 percent of a sticker less, maybe now it is 20 percent of a sticker less, but in either case it is a small amount.
In both the conclusion and in the popular articles the authors and reporters only refer to differences between religious and non-religious people without any controls, and no one mentioned the small differences we are talking about – less than one sticker. The authors do not state if the second experiment (the one about “vindictiveness”) is robust to these controls – so presumably it is not.
The researchers interpreted these three experiments as indicating that religious people think they are more helpful, but they are actually less helpful and more punitive. In interviews with reporters Decety explains that if people think they are more moral they give themselves permission to be more immoral. Thus, thinking you are moral is detrimental. Decety also claims his research shows that secularization is good. “…secularization of moral discourse does not reduce human kindness. In fact, it does just the opposite.” Both claims are rather broad and not well supported by the data. We do not know if the children think they are more moral, only that their parents think they are more moral. We do not know why the religious children gave away fewer stickers (or even if the association is causal) let alone that they acted less “morally” because they think they are more moral. Nor did the researchers do any investigation about the effect of secularization on kindness.
Popular articles extrapolate even further. An article in The Mirror claims that “Children of atheists are kinder and more tolerant” – although it is unlikely that 28% of the parents in the sample coded as “non-religious” are all atheists. An article in Forbes claims the research demonstrates that religious people are “less moral” and that “History backs-up the scientific evidence that secular people are more moral.” I guess a fraction of a sticker outweighs Hitler and Stalin, but who’s counting?
Most popular-press accounts assumed the association is causal. None of the dozens of popular-press account I read (other than the coverage in Science Magazine) mentioned any of the previous research on the topic or interviewed a scholar who had a different point of view. All popular accounts I read (other than two private blog posts) were laudatory, often to the point of breathlessness. Clearly the research said something that a lot of reporters wanted to hear, and spread. And clearly almost none of them were willing to do a simple Google search to see what previous research said, or interview any of the dozens of scholars who specialize in this area.
So, how do we evaluate if this research is worth taking seriously?
1) Does the research adequately deal with and explain previous literature? No.
There are dozens of articles and books about the relationship between religion and altruism. The vast majority of this research shows that religious people are more altruistic than non-religious people. Much of this literature is based on self-report, but some is based on unobtrusive observation. Much of this research also comes from high-quality random samples. However, there is some complexity in the evidence about the relationship between religion and altruism so we need to look at the evidence by type.
First, there is a widespread popular belief that religious people are more helpful. Decety and colleagues dismisses this evidence out of hand. But this implies that most people are stupid. If religious people were in fact significantly less generous that secular people, the popular perception that the reverse is true would be hard to sustain – especially for people who interact with them regularly.
Second, survey research consistently finds that religious people give more time and money to both religious and non-religious causes, both in formal and informal settings. Most of this evidence is based on self-report and Decety and colleagues suggest that the association is caused entirely by social desirability bias – i.e., highly religious people exaggerating how helpful they are more than non-religious people exaggerating how helpful they are. Some social desirability bias is plausible, but neither Decety and colleagues, nor the one article on the topic they cite, give any concrete evidence that the association between religion and self-reported helping behavior is caused by social desirability bias. They assume it is. This is a strong assumption. Some of the survey-based research on altruism even attempts to measure and control for social desirability bias – yet still finds an association between religion and helping behavior. The type of religion people follow and their motivation for being religious also predicts helping behavior. Clearly if all these associations are completely caused by social desirability bias, survey research of all kinds is in deep trouble. It is hard to think of an interesting research project in which some of the response are not more socially desirable to some respondents. Do the authors assume all survey research is pointless, or just the results they don’t like?
Third, laboratory studies typically find either no relationship between religiosity and giving in various games, or a weak positive relationship. Typically these studies are done with college students, often students from psychology or economics classes, and often in Europe. Little is known about whether or not behavior in these experimental games matches people’s altruistic behavior in real life, or if undergraduate psychology majors behave similarly to other people. Game situations may alter behavior – for example, we all know people who love violent video games and happily kill people on screen, but are not unusually violent in real life. Moreover, since these types of games are used so often in psychology classes, it is unclear whether or not students have read about them before and know the purpose of the game while they are playing it. Even if we assume that games played in a laboratory perfectly capture how everyone acts in the real world (which I do not), laboratory-based experiments do not suggest a negative relationship between religion and altruism, just a neutral or weak positive relationship.
Finally, and most convincing to me, unobtrusive observation of real-life behavior suggests a positive relationship between religion and helping behavior both at the societal level and the individual level. This research also suggests that Christians, particularly Protestants, are more likely to be involved in institutional helping behavior. For example, in Japan virtually all the voluntary work with homeless people is done by religious organizations, the vast majority of which are Christian despite Christians being a tiny minority in the country. Similarly in countries like the US, the vast majority of voluntary humanitarian organizations, private schools and so on were set up by religious groups/people. This would be unlikely if religious people were not more generous on average.
We see a similar pattern on an individual level. For example, when academics conduct surveys, they often ask interviewers to evaluate who friendly and cooperative the respondents were. Surveys also want to have good response rates, thus when people refuse to participate in a survey, researchers often have a trained expert re-contact those who refused and try to convince them to change their mind. Presumably people who gave time for the survey on the first time are more generous with their time than those who had to be pressured or convinced to participate. As part of my master’s thesis, I analyzed every survey I could find that collected this type of information. I found that interviewers rated highly religious people as being significantly more helpful and cooperative than non-religious people, and that those who had to be convinced to participate in the survey were significantly less religious than those who agreed to participate from the beginning. This suggests that in ordinary life religious people are more generous with their time than non-religious people.
Thus, most non-laboratory research suggests a strong positive association between religion and helping behavior, and laboratory research suggests a neutral or weakly positive association between religion and helping behavior. No line of research suggests a negative relationship between religion and helping behavior. Thus, the research by Decetcy and colleagues is clearly an outlier and if reporters had cared to interview anyone who does research in this area, these scholars would have likely told them so.
2) Is the article in an appropriate, peer reviewed journal, where scholars are likely to have been able to catch the major flaws? No.
The article is published in a biology journal, despite the fact that the article does not focus on anything biological, and none of the authors are biologists. This seems odd. Perhaps publishing the article in a biology journal avoided getting reviewers that know the literature on religion and altruism, who would likely force the authors to do a better job: e.g., measure religion well, add sufficient controls for plausible alternative explanations, and at least deal with the previous literature on the topic. Basing an article primarily on t-tests from a non-random sample may be acceptable in biology, but I haven’t seen a peer-reviewed statistical article like this published in an important social science journal since the advent of personal computers (when scholars did not have to calculate statistics by hand).
Both the academic article, and the popular articles based on it, talk about religious children and non-religious children in general, but the authors did not sample children in a way that allows them to generalize to religious and non-religious children or even religious and non-religious children in the seven cities where they conducted their research. Given the serious problems with the sample, we do not know who the results generalize to.
The authors picked six countries non-randomly (Canada, China, Jordan, Turkey, USA, and South Africa), picked one or two cities from each of these countries non-randomly, and then recruited respondents non-randomly. Nothing about the sample is random, and there are many ways this sampling method is likely to bias results towards religious children appearing less altruistic. For example, if you recruit religious children from a South African slum and non-religious children from the families of University of Toronto professors, you are likely to find some differences between the children that have nothing to do with religion.
Because the sample is not random, all generalizations from their sample and all significance tests using their sample are meaningless. Any first year statistics texts book will tell you this. Of course if no other data exists, using non-random data is the best we can do, but research based on random samples does exist, and we should ALWAYS privilege results from random and probability samples over non-random convenience samples. And random samples consistently suggest a positive association between religion and altruism.
4) Do the authors do sufficient work to demonstrate that the relationship between religion and giving behavior is causal? No.
Even in good samples, correlation does not prove causation. But with a badly biased sample, even more effort is required to demonstrate that a correlation is plausibly causal. Unfortunately, the authors do not even go to the effort I would require in an undergraduate statistics classes.
Of course, demonstrating causality is difficult. The authors cannot randomly assign religious background to children and then see if religion causes differences in altruistic behavior. Thus, social scientists typically try to account for as many alternative explanations as possible, to demonstrate that the association between religion and giving behavior is not caused by something else. Past research on altruism demonstrates that many factors are associated with giving behavior, but the authors control for none of them. If any of these omitted factors is correlated both with religiosity and with the giving behavior of children, or is correlated with which religious and non-religious people are sampled, then the relationship between religion and giving in the author’s analysis will be biased.
For example, both wealth and trust can influence giving. If children from wealthier backgrounds have more access to stickers than children from poor backgrounds, on average this makes stickers less valuable to wealthy children than poor children. Thus, giving stickers to other children is less costly for wealthy children than poor children. Similarly, in contexts of high-trust, low corruption, and low violence, people generally trust ‘the system’ more. Thus, a child from a high-trust context may trust an unknown researcher to give the sticker-gift to another child more than children from low-trust environments. If in the sample of “non-religious” children disproportionately come from wealthy, privileged families and live in high-trust environments relative to the religious children, this will create a spurious negative association between religion and giving. But religion is not reducing giving; poverty and low trust are.
Problematically, it seems likely that the authors coded many more Canadians as “non-religious” than Jordanians and South Africans. But Canadian children are also typically wealthier and trust strangers more than Jordanian and South African children. Similarly, if we think about the university contexts where the samples were taken, it seems likely that the authors sampled wealthier, high status “non-religious people” and poorer, lower status “religious people”. So for example, if we look at the people who live around the University of Chicago, wealthy, high-status, low-religiosity people disproportionately live in Hyde Park (immediately around the university), but they are surrounded by a large predominantly poor, African-American population, who live in government housing, have struggling schools, and are typically much more religious than their Hyde Park neighbors. Non-random samples taken at universities often have this problem – getting children of university employees (who are disproportionately privileged but secular) and those in the surrounding communities (who may be disproportionately less privileged and more religious).
5) Is the statistical analysis rigorous and appropriate? Is it plausible that differences in religious upbringing is the only thing that makes stickers more valuable to some of the children than other children? No.
I cannot remember the last time I saw a published statistical research article in the social sciences based primarily on t-tests (which assume the only relevant difference between the religious and non-religious children in the sample is their religion). If we compare the sticker giving of poor Christian children from a South African slum with wealthy non-religious child in a Toronto suburb, is it plausible to think the only difference between them if their religion? No. But both the authors and journalists focus on the comparison between the religious and the non-religious without any controls (which assumes the two groups are identical in every other way). Because there are more non-religious people in Canada than South Africa or Jordan, and wealth probably influences how valuable stickers are to children, carefully controlling for country and SES is crucial. The authors do some of this, but in a weak and misleading way.
The authors back up two of the t-tests with OLS regressions that control for age, country, and what they misleadingly label “Socio-Economic Status” (SES). However, the only measure of SES they use is a rough measure of mother’s education (simplified to six categories), but they never mention this in the text. I had to search their supplemental material to find their measure of “SES.” But is control for mother’s education (in six categories) sufficient to equalize the socio-economic status of all children? I don’t think so. That implies, for example, that every child whose mother has a high school degree has the same access to resources as every other child whose mother has a high school degree, regardless of income, wealth, father’s education, race, parental marital status, etc. I doubt the authors think mother’s education fully accounts for SES either, or they would not have hidden the measure in an online appendix. It takes six words to say “We measured SES using mother’s education.” Not a lot. But that would have raised red flags.
Think of it this way. If you are wealthy and go to a well-financed school, you may have hundreds of stickers at home and get more regularly. Thus, stickers are not particularly valuable. It is easier to give stickers away because you can easily get more. Alternatively, if you come from a poor single parent family and attend a poorly-financed school, you may rarely get stickers. This makes stickers much more valuable to you and make giving them away harder. If two children have an equal amount of altruism, on average the child who has easy access to stickers is likely to give away more stickers, than the identical child that has little access to stickers.
Now think about how this might work in the sample we are discussing. Presumably the University of Chicago team recruited people close to the university. Imagine, they recruited two 8 years olds one named Gwyneth and the other Kanisha. Gwyneth is European-American and attends the Laboratory School, an elite private schools with lots of resources. Her father is a physics professor at the University of Chicago and makes a large salary. Her mom earned a BA from Harvard, and works at the Chicago Art Museum. Both parents came from wealthy, well-educated families, and are not religious.
Kanisha lives 10 blocks away from Gwyneth, but in a government housing project in a South Side slum. Kanisha is African-American and attends a struggling public school with few resources. Her Mom is a single parent, who attended a local community college in the evenings and recently graduated with a degree in social work, but still works as a waitress at Denny’s and is struggling financially. Kanisha and her mom attend a local AME church every week.
In the regression the authors published, they assume Gwyneth and Kanisha are identical – i.e., the only relevant difference between them is their religious identity. Both children are eight years old, both live in the U.S., and both have a mother with a BA – thus the authors assume both have identical socio-economic statuses, that stickers are equally valuable to both of them, and that the only cause of differences in how many stickers they give away is their religion. But it is hard to believe that stickers are equally plentiful in both their homes and both their schools. It is also unlikely both are equally trusting that “the system” will work fairly for them, or that an unknown stranger will actually give the stickers to another child. Even if Kanisha’s religiosity increases her generosity relative to other similar children, this increase may be insufficient to overcome the differences in wealth and generalized trust between her and Gwyneth.
Other measures of SES are common on surveys and would make the results more plausible if the authors controlled for them: e.g., family income, family wealth (or at least home ownership), Father’s education, race, marital status of parents, and average SES in the child’s school district. It seems odd that these researchers would not collect any of this information on their survey, and odder still why they would not control for them if they are on the survey. Perhaps the results go away if they do. Remember we are only talking about a faction of a sticker. I have asked the authors for their questionnaire and for replication data so that I can check this, but so far they have sent me nothing.
6) Is religion carefully measured and are religious groups carefully distinguished? No.
Over 60% of the religious people in the sample are Muslims. This means that in most of the t-tests and all of the regressions, Muslims disproportionately drive the results. If Muslims are different from other religious groups, or if the Muslims in the sample are disproportionately from poor or distrustful communities, this would bias the results the authors attributed to all religious children. But are Muslims identical to all other religious groups? In the t-tests the authors show us, the difference between Muslims and Christians is statistically significant 50% of the time. So why do they lump them as one group in all the regressions and all their conclusions?
The authors use two measures of religiosity: “How often do you attend religious services?” and “How often do you experience the ‘divine” in your everyday life?” They then merge these into a single variable. The measures of frequency of attendance is biased towards Muslims. Because Muslims are expected to pray five times a day, every day, attending religious services more than once a week is more common among Muslims than Christians. The frequency of divine experience is likely biased towards Pentecostals. Thus, Muslims and Pentecostals will tend to cluster at the high end of the religiosity variable.
We never learn if religiosity predicts lower generosity in all six countries in their sample and for both Muslims and Christians. Religion and religiosity are, for the most part, assumed to be one thing and assumed to work the same everywhere – as if the type of religion and the context of religion do not matter. Given the major sample problems, it would increase the plausibility of their results if the religious/low-giving association were consistent regardless of context and regardless of religious tradition.
7) Do popular versions of the article report what is in the academic article accurately and ask other scholars familiar with the research topic to evaluate it? No.
One striking feature of the many newspaper and magazine articles that reported this story is that they consistently take the research by Jean Decety and his colleagues as objectively true and unproblematic. They do not interview any other scholar who has researched this topic, nor cite any of the many peer-reviewed journal articles and university press books that find a different result. Given the dozens of scholars who have researched religion and altruism, it would not have been hard to find another scholar who could have offered perspective. I thought it was standard journalistic procedure to get more than one point of view for a story. Maybe not if the story says something you desperately want to be true.
 In some analyses the researches also statistically controlled for age, country, and a rough measure of the education of the child’s mother. I will discuss the adequacy of their controls later.
 However, the research does not show that the children who gave fewer stickers viewed themselves as more generous or that sticker-giving in a controlled laboratory setting more accurately reflects the behavior of children than their parents’ general observations.
 Decety and colleagues cite only one article from the literature on religion and altruism (a polemical and one sided review in a mediocre journal: the Social Science Journal).
 In a random sample everyone in the population has an equal probability of being selected to participate in the research project. In a probability sample everyone in the population has a known (although not necessarily equal) probability of being selected to participate.
 Alternatively psychologists tend to sample students in their psychology classes, which creates other problems.
 For readers who are statistically trained, the author’s models even violate the assumptions of OLS regression. The number of stickers children give away is a count variable, thus Poison or negative binomial regression are appropriate, not OLS regression.
Robert D. Woodberry is director of the Project on Religion and Economic Change and associate professor of political science at the National University of Singapore where he teaches statistics and methods. His research appears in the American Sociological Review, Annual Review of Sociology, and American Political Science Review and has won fifteen outstanding research awards from academic associations – including the Luebbert Award for Best Article in Comparative Politics (2013) from the American Political Science Association, and the Best Article in Sociology of Religion Award (2014 & 2001) from the American Sociological Association.