Ben Goldacre’s Bad Education? – Guest Post By Jonny Scaramanga

What follows is a guest post by Jonny Scaramanga - you can find his site at and follow him on Twitter at:

Ben Goldacre is a bit of a hero to me. Like a lot of people, I discovered Bad Science and skepticism at the same time and found something I wanted to be part of. But now Dr. Goldacre has stepped into my field – education – and, frankly, he’s made a total balls-up of it.

Now, there are lots of my peers in the social sciences who have flakey ideas about science. They’re full of post-modernist relativism, all “different ways of knowing” and “challenging positivist presumptions”. I am not among the science doubters. I like my evidence robust and my social science scientific. When Ben starts calling for randomised controlled trials in schools, I’m on his side, in principle. But on this subject, he has no idea what he’s talking about.

The idea of RCTs in education is an attractive one, but we hit our first problem immediately: how do we agree on what ‘good education’ looks like? Trials in medicine are less problematic, because it is generally uncontroversial what constitutes a healthy body. It is much more controversial what constitutes a well-educated child. I could not disagree more profoundly with the currect education secretary’s view of education, so I am almost bound to reject the conclusions of any trials he commissions.

Perhaps we will be looking to see which interventions result in better exam grades. This, though, would be a spectacular exercise in begging the question, because it would assume that the tests are valid and measure the right things.

We can get around this by only measuring things which are uncontroversial – attendance, perhaps. But then we will have a tendency to pick our research topics based on what’s easy to measure, not what’s necessarily most meaningful.

Bad Science explains superbly why trials must be double-blind to be meaningful. This can’t be done with education. The teacher has to know about the trial, and the children probably do too. If we’re serious about research ethics and informed consent, they definitely do. This is a problem because there’s a thing called the Hawthorne effect, which is that people behave differently when they’re in a trial.

All the problems boil down to dealing with minds, rather than bodies. Minds are complicated things, and they enjoy a great deal more variation than pancreases. As Rebecca Allen argues over at the IOE blog, social science research is far more context dependent than medicine. What works for my kid might not work for yours.

But in the hands of a force-10 moron like the current Education Secretary, RCTs could become weapons of mass destruction as interventions shown to have some usefulness in Surrey are rolled out nationwide without regard for social context and individual differences.

Individual differences matter a lot in education. If we find a teaching method that works for 70% of students, that would be a titanic success. Unless you’re a parent of one of the 30%. Good teachers know this; they employ a variety of strategies to help their pupils succeed.

Differentiation is hard, but essential. If Michael Gove thinks evidence can help us arrive at the Right Way to Teach, he is even more delusional than my colleagues already believe him to be.

Kinda Thinky – Now You Can Watch Online!
Robin Ince Australia Tour Video
It’s All Happening In 2015 – Radio And Podcast Update
Podcast, Yet More Podcast And Australia Day Radio
About Kylie Sturgess

Kylie Sturgess is a Philosophy teacher, media and psychology student, blogger at Patheos and podcaster at Token Skeptic. She has conducted over a hundred interviews including artists, scientists, politicians and activists, worldwide.
She’s the author of the ‘Curiouser and Curiouser‘ column at the Committee for Skeptical Inquiry website and travels internationally lecturing on feminism, skepticism, and science.

  • Donalbain

    A well designed RCT would answer most of your objections.You say that it depends on what you want to measure, then measure lots of things. If someone thinks that methodology A is better than methodology B, then run an RCT and measure everything you can think of. Then there would be actual evidence for the discussion of which methodology is better. We might end up with one side saying A is better because of better test results, while another side says B is better because of greater ethusiasm from children, but at least there would be evidence and we could decide as a society what we valued more.

    • Jonny Scaramanga

      But the RCT can’t answer those philosophical questions. Is learning more about critical thinking or about gaining knowledge?

      • Donalbain

        Maybe the RCT can’t answer the question you specifically ask, but it can provide data about whether some strategy or another improves critical thinking, or gaining knowledge. And THEN we can have the philosophical discussion about what balance we want to aim for. And if we, as a society, decide that we want to aim for critical thinking, we would know what methodologies work best in getting us towards that aim.

        • Jonny Scaramanga

          I’d like you to be right, of course. I’m not trying to be obstructive. But I don’t see it. How do we measure critical thinking skills? There’s a great deal of support for the idea that learning is context-dependent. We can’t measure critical thinking directly; we can ask students to do something which we assume requires critical thinking. But how well supported is the idea that this will transfer to different kinds of critical thinking in different situations? Psychological evidence suggests not very.

          Let’s assume we can solve all of that.

          You assert that a well-designed RCT can answer my objections. I’m curious how. We’re looking at a trial which isn’t double-blind and in which the Hawthorne effect is at play to an unknown extent.

          • Harriet R

            In experimental economics we don’t talk about double-blind because it’s just not feasible. There are “field experiments”, where those in the trial are not aware it’s a trial, to avoid experimentation effects on both control and treatment group, and “framed field experiments” where they are aware, with the attendant behavioural effects. The relative ethics of these two approaches is debateable and I’ll return to this in a minute.

            The core concerns of evaluation of social and (micro)economic policy is “internal” and “external” validity. Internal – does the measure we calculate actually accurately reflect the effect of the intervention and nothing else? External – is the effect we’ve measured applicable to other groups if we extended the intervention to all of the relevant population. In education these are both an issue – can we actually measure the objectives we care about and might there be confounding factors? And what happens with different groups (as you say a trial in Surrey won’t necessarily reflect kids at school in other parts of the UK).

            So for a good RCT you need to deal with all these concerns – what are the outcomes of interest and can we actually measure these in a meaningful way? This is no small issue with education in particular, but this doesn’t just affect RCTs but all kinds of evaluation of education policies. Have we chosen our sample to represent the wider population to which we want to apply our policy? Ideally the sample should be large enough (and possibly stratified) so that you can see whether the policy has different effects on different groups.

            The experimentation effects associated with awareness of a trial undoes internal validity, hence even if ethically it might seem unfortunate, there’s not much alternative. Also the ethics may seem less dubious in the case of govt policy where the choice is between rolling out the policy without evaluation or having an initial evaluated trial.

          • Campbell

            I absolutely agree with Harriet above. Even in medicine a
            huge consideration is that a treatment for one group (whether that be socio-economic, demographic or based on other medical issues) may not work as well or as desirable on another. That is why doctor patient consultation still exists and medicine is not distributed on a direct one symptom/one medicine basis.

            RCTs however enable us to discover what works for particular groups and not others rather than generalising and guesswork based on anecdote and hunches which is the only alternative. Education is luckily unique as the vast sample sizes available to us enable a comprehensive analysis of all outcomes for all groups if a central and well managed systematic approach is used nationally or even internationally. The fact that “Differentiation is hard, but essential.” is of course absolutely true but, without evidence, those differentiated approaches have no defendable worth!

            The inability to run true double-blind trials in certain
            situations is again something not unique to education but exists in a variety of cases e.g. effect of quality of care on treatment outcomes in medicine. So long as these effects are reduced as far as possible by careful design, and any shortcomings detailed in the methodology and analysis this should not pose an insurmountable problem.

            Finally, and most importantly, Jonny’s point about the fact
            that RCTs can not be applied without knowing what constitutes good education is reasonable to a point but relies on two false assumptions:

            1) That it matters.

            True, ideology has a great weight within educational
            approaches, but this is due to, not in spite of the lack of a systematic evidence driven approach. If you say skill development is more important than knowledge acquisition that’s fine, but here is your chance to prove it! Devise RCTs which allow direct comparison of teaching approaches and select some real world outcome measures to see what works.

            2) That RCT design must be based on an overall academic measure (e.g. exam results).

            I would be the first to say that standardised exams are
            rarely the best method of measuring the complex mix of knowledge, critical application, skills and experience that we like to think of as ‘learning’. Fortunately we don’t have to use them. Education RCTs outcome measures can be things as rationally evident as improved attendance or qualitative information such as pupil happiness. If you do want to look at learning specifically again there are approaches that will work whether small scale long/short recollection of key facts to test knowledge acquisition specifically, or observation records of process to measure skill building.

            I am very excited with the prospect of involving myself in my institution’s future research projects, partially because I think it will be a challenge to design truly informative trials, but I truly believe that RCTs are as applicable here as
            they are in medicine, social policy or whichever field you care to name.

          • Donalbain

            It strikes me that you are setting yourself up as a mystic. “You cannot ever measure whether what I do works or not, so you just have to trust us”.

  • Andy Lewis

    Unfortunately Jonny, your argument about “how do we agree on what ‘good education’ looks like? looks no different from the special pleading of quacks.

    Their argument usually follows the line of “RCTs cannot work on my medicine because I have different views on what is valuable. I am holistic and cannot reduce my therapy to a simple number. My therapy does not follow a conventional paradigm”.

    But RCTs are completely neutral to mechanisms, values or even the measured outcomes.

    In education, it appears to be pretty simple: you define what your values are, how you gauge success and what methods you propose to get there. An RCT then is just a tool to provide good evidence that your proposed method actually achieves your chosen success measures better than some alternative.

    The RCT will not tell you that your values are good values or that your success measures are meaningful, but if you can convince someone that you are right there, you know have evidence from an RCT to show you how to achieve your goals.

    • Jonny Scaramanga

      I would have thought, Andy, that you have seen enough of my writing to credit me with knowing the difference between a genuine definitional problem and the doublespeak of homeopaths.

      Perhaps I am overstating the importance of the lack of blinds and the Hawthorne effect, but I think they are a genuine confounding technique.

      Unfortunately, how we gauge success is also problematic; the methods which are the most reliable (ie repeatable) and often the least valid (ie meaningful). Traditional assumptions about testing –eg learning can be decontextualised, and achievements are indicative of internal traits – are cast into doubt by modern psychology.

      You might think I’m being obstructive. I’m not. I think these problems can be solved, and should be. I believe a rigorous social science (with latter word being operative) is achievable. But Goldacre’s paper glosses over these challenges, when in fact they are fundamental.

      • Andy Lewis


        I do think your first point is making essentially the same error as we see in the complaints from quacks that RCTs do not work for them.

        To try to be more succinct: there is a difference between deciding what we value and what is important in education and measuring how methods can achieve our goals.

        I think this is the same issue as when quacks complain RCTs cannot tell us how my medicine works. Of course not. RCTs can tell you if they work.

        RCTs can never tell us what to value in education. They can never tell us what good outcomes should be. But they can tell us how to get there once we have stated where we want to go.

        • Robert Bauer

          Ohhhkay, there’s a fundamental difficulty here that I don’t think you quite grok. The actual outcome, the true purpose of education, is in what the students do after they finish school. The point of teaching people things is so they can use them, right? Since education is a ten to twenty-five year process that involves thousands or millions of educator decisions, there’s no way to directly test the influence on real outcomes (that is, what happens after the student leaves school) of any particular education technique. Even if you could put together a forty-year longitudinal study, any remotely realistic strength of effect would be lost in the noise.

          To really make education studies rigorous, you would first need to pick out some near-term assessment instruments – critical thinking exercises, questionnaires on student enthusiasm and contentment, standardized tests, whatever you thought would work – and run the forty-year longitudinal study *on your testing instruments*, to see what near-term-measurable results actually correlated with good outcomes post-education. Better have a great big honking sample size, too – lots of students, measured lots of times with each instrument you want to know the effectiveness of. While you’re waiting for your data, you can have the forty-year argument about what actually counts as a good outcome. (Does making money count as success? History teachers might have a problem with that – few people need history in their jobs but everyone needs history when they vote. Similar issues arise for other measurements we might propose. So we need a spread of different criteria… weighted how? And so on.)

          And then, after all that is done and analyzed and independently repeated (hopefully the confirming trials are done in parallel rather than in sequence) you can START using the instruments that worked well to measure the effect of the actual education.

          That is, if any of the instruments did work well, which isn’t guaranteed.

          Gee, I hope nothing happened over those forty years to change the effectiveness of those testing instruments you were checking out.

          In other words, it’s technically possible to use rigorous scientific methods in education, it’s just an experimental design problem – but it is one HELL of an experimental design problem. In practice, attempts at rigorous education studies tend to measure what can be measured instead of what should be measured, change the behavior of the students and educators in the course of the experiment, and still fail at key points of experimental design like blinds, control groups, and controlling systematic error.

          • Andy Lewis

            Robert, I would suggest that all you have done here is applied your values to decide that the only important outcomes are manifest in some career or other adult activity.

            People may disagree with your value choices here and see that intermediate outcomes might be valuable in their own right. In order to push RCTs out of education all you have done is set nebulous long term measures that of course make designing and executing a trial hard. You say the only thing that matters is boiling the ocean. But we might be OK just seeing how best to heat a cup of water first.

            Education is a layer cake with knowledge, capability and understanding growing though a series of steps. I see no reason why an RCT cannot be designed to see how incremental levels of improvement might best be achieved.

          • Robert Bauer

            Wouldn’t that constitute RCT’s trying to tell me what to value in education? I’m a professional educator and I do put long-term outcomes first. I get that some people, including, no-doubt, other educators, would disagree with me. But certainly you wouldn’t suggest that I should value different things *because* the things I value are hard to measure? Surely I should only change my values in response to ethical arguments, not practical considerations.

            If you do an RCT on how to, say, improve standardized test scores, it could be the best study in the world and show a real big effect size on a large sample that reflects my student demographics. But I don’t care, because I’m pretty sure standardized tests don’t much correlate with the benefits I actually want my students to have. I’m going to ignore your study because it tells me, with great precision and certainty, how to get something I don’t want.

            If you let me (or someone like me) pick the success criteria – because of course I have short-term-measurable criteria that I think do correlate with outcomes I value, I need them to do my job – then okay, I’m listening. But now you have to study things that are harder to measure (student confidence), ethically fraught in the context of double-blind trials (dropout rate), or both. In practical terms, that increased difficulty means there aren’t as many studies like this and they’re not done as well. And you’re basing your operational definitions on the unscientific educated guesses of the people whose behavior you’re trying to inform/alter with this study in the first place. Isn’t that a great big bias?

            It is not my object to push RCT’s out of education. Good education science sounds like a great idea, if you can pull it off, and my everyday teaching is already informed by controlled studies on things like stereotype threat. But learning is hard to measure. I haven’t seen a good argument, let alone evidence, that RCT’s are going to be able to study all, or even many, of the things I need to do my job any time soon.

  • Pingback: The Inaugural Patheos Atheists Blog Carnival!()

  • John Brown

    Arguments against RCT here seem to be
    saying that if it cannot cure cancer overnight or tell us the meaning of life then
    we should not even bother trying with these fiddly tedious narrow minded methods.

    I can’t think of a polite way of putting it that this seems to be a misreading of
    the progress of science, knowledge and insight into the worlds wonders which has infuriatingly proven
    itself impermeable to the answering the big questions but has succumbed ever so
    gradually to these fiddly tedious narrow minded methods. Johnny and Robert seem
    to harbour hopes and expectations of education research not fulfilled in any
    other field of human enquiry.

    I agree with Johnny it’s a forlorn hope
    education research can foreseeably answer many of the big interesting questions,
    but in my studies of psychology this is precisely the maturational process
    psychology had to go through. It spent the middle half of the last century
    trying to answer the big questions and came up with nada. A complete revolution
    replaced the psychobabble with RCT and it now makes real progress on very much
    less ambitious questions like eye movements.

    They say they cannot imagine how it
    could help us make progress towards answering these ultimate questions. Again
    this reflects a lack of awareness about how progress has been made towards big
    theories because progress has clearly been made in all fields through hints and
    clues derived from fiddly tedious narrow minded methods, leading to big breakthroughs
    – see Ernst Mack, Carl Popper, Paradigm Shifts etc.

    As Andy Lewis and Harriet
    R say, RCT is waiting to help you falsify your theories when you develop the
    imagination to think of way of testing them.