On the Methodology of Surveys

On the Methodology of Surveys June 24, 2019

A bit of a different one today. I was having an argument with Guy Walker on Facebook the other day concerning something I had previously posted here:

His response was:

Based on a sample of 892 out of 160,000 (0.006%). The other 159,108 would have told them to sod off and stop asking such crass binary questions (ie why can’t you have Brexit _and_maintain the union). I’m a member and I certainly would have done. The whole thing is posited on the idea that the options presented are the only possible outcomes which might suggest the whole thing was motivated by a desire to present members as kamikaze idiots. I can’t imagine what would motivate anyone to do that.


Secondly, the extrapolation from a tiny sample to conclude that the whole would be homogeniser in exactly the same way makes the usual spectacular error of omitting consideration of what you are dealing with.

What I am going to talk about today is this idea that 892 people is a “tiny sample size”. I have actually had this argument a good number of times before, so it is time I laid it out here. You can perhaps forgive someone for thinking that when a sample size, compared to the target population being analysed, is apparently very small as a fraction of the whole, then the conclusions of the survey will be inaccurate. The sample size has an apparent perceived tininess in comparison to the target population.

Let me first introduce you to some basic terms:

The confidence interval (also called margin of error) is the plus-or-minus figure usually reported in newspaper or television opinion poll results. For example, if you use a confidence interval of 4 and 47% percent of your sample picks an answer you can be “sure” that if you had asked the question of the entire relevant population between 43% (47-4) and 51% (47+4) would have picked that answer.

The confidence level tells you how sure you can be. It is expressed as a percentage and represents how often the true percentage of the population who would pick an answer lies within the confidence interval. The 95% confidence level means you can be 95% certain; the 99% confidence level means you can be 99% certain. Most researchers use the 95% confidence level.

When you put the confidence level and the confidence interval together, you can say that you are 95% sure that the true percentage of the population is between 43% and 51%. The wider the confidence interval you are willing to accept, the more certain you can be that the whole population answers would be within that range.

For example, if you asked a sample of 1000 people in a city which brand of cola they preferred, and 60% said Brand A, you can be very certain that between 40 and 80% of all the people in the city actually do prefer that brand, but you cannot be so sure that between 59 and 61% of the people in the city prefer the brand.

This is extremely important to understand. 892 is a very good sample size that will give you very good accuracy on a population of 160,000. That is simply a fact. Walker is giving a very common but extremely naïve argument because he perceives 892 as tiny against the population. Intuition, though, is not always accurate. From a sample size calculator, a sample size of 895 on 160,000 gives you a confidence level of 99% and a confidence interval of 4.3. That is exceptionally good.

What this means is that we can be 99% sure that the survey is correct within a margin of error of about 4%. Thus we can be pretty much certain that the above survey holds for the population of Conservative Party members.

But it gets better than that:

A good maximum sample size is usually around 10% of the population, as long as this does not exceed 1000. For example, in a population of 5000, 10% would be 500. In a population of 200,000, 10% would be 20,000. This exceeds 1000, so in this case the maximum would be 1000.

Even in a population of 200,000, sampling 1000 people will normally give a fairly accurate result. Sampling more than 1000 people won’t add much to the accuracy given the extra time and money it would cost.

For the case in hand, then, the population size is 160,000. Therefore, one might think that taking a sample of 10% of that population would be good. In other words, we would need to ask 16,000 people what they thought. However, this exceeds the 1000 maximum limit advised here for a survey. This is because the amount of money and effort it would take to garner extra responses over and above 1000 randomised people in the target population would not justify the tiny amount of improved accuracy in the survey. It turns out that’s, in our case, 892 people is easily sufficient as a sample size to give us a really accurate picture.

To add to this a little bit:
  1. Size of the population
    Here you have to enter the size of the group that has to be represented by the sample. If you conduct an employee survey for instance, your population would be the total staff.  Once the population exceeds 20,000, your sample size will not change very much anymore.

Once you hit that benchmark value of 20,000, changing the sample size will have little to no effect.

Of course, the interlocutor with whom I was debating didn’t take kindly to being shown he was wrong. In fact, the whole argument became derailed by not addressing my points in any way at all and deferring back to his favourite topic: conflating such scientistic social science with an attack on humanity and free will. One cannot lose this cherished notion of free will, so therefore everything else is wrong instead.

I’m not wrong precisely because with human beings you can’t extrapolate in this way because of the element of freedom that can’t be accounted for in an equation. That element would be lacking in any other animal…. The argument is really about the nature of the subject being studied.
Oh. Okay. All surveys are irrevocably wrong all of the time and you can only infer accurate conclusions of what a population think by asking each and every single member. Right. Gotcha.

There really is no argument. If you want to know what people think without pragmatically or financially being able to ask every single one, then you design a survey that gives you accurate results. But he wants an argument because he doesn’t like the result. Rather than deal with the result (which, incidentally, will tell you at the very least that of those 892 people asked, those percentages pertain), he would rather, out of hand and without doing any research on the matter, simply decry the validity of the survey. This is wonderful cognitive dissonance.

When facts contravene his cherished notion of free will, then facts will always go. They always have for him. It is his dogma.

Stay in touch! Like A Tippling Philosopher on Facebook:

Browse Our Archives

Follow Us!