# Statistics and the Synoptic Problem

Statistics tell us… well, even though they are numbers, statistics do not automatically provide answers to questions.

And 73.2% of all statistics are made up on the spot – as that one was.

But sometimes statistics can tell us something important. They can quantify what otherwise may seem vague and merely impressionistic.

I regularly cite statistics in talking about the differences between the Gospel of John and the Synoptics. Comparing how often “kingdom,” “father,” and “I” occur (as James D. G. Dunn does in The Evidence for Jesus) shows that it is not just a subjective impression that the style and content of John is significantly different from the other Gospels.

And statistics are striking when it comes to comparing the Pastoral Epistles with other letters attributed to Paul (see the charts in P. N. Harrison's book The Problem of the Pastoral Epistles).

So what about the Synoptic problem? Can statistics help us there?

On Ian's blog Irreducible Complexity, he had a post recently that addressed this very subject. Some work done by Dave Gentile (someone with an interest in statistics and the Synoptic problem as a hobby) inpid Ian to do some work of his own on the topic. Both provide moderate statistical support for Markan priority.

And what about Q? That's what you really want to know, isn't it?

Both Gentile and Ian looked into that too, and the results are…

inconclusive.

Ian wrote in response to this state of affairs:

So one of those (very common) statistical experiments where the results tell you nothing of interest. Which is a shame.

I came to the conclusion that the decisive arguments were likely to arise out of close analysis of textual patterns…

And he goes on to mention Mark Goodacre's work as one example of that. I would love to see further statistical work done on this by academics. Has anyone tried to determine whether Luke's or Matthew's Q material stands out as more like the other's material, than that author's own style and vocabulary? That would provide significant support for one of the alternatives to the Q hypothesis. Or is that what Ian and Dave Gentile did?

Perhaps further statistical study will help. But don't count on it.

Stay in touch! Like Exploring Our Matrix on Facebook:

• Gary

Lots of info. Very interesting. I’m not an expert, but something quickly jumped out at me, probably because I Iike gnostic texts.
“Matthew then is a later conservative response, perhaps in reaction to the development of Gnosticism. For example, the gospel of Matthew seems interested in emphasizing the validity of the Jewish scripture.”
This is the first significant motivation I read about. Much better reason than someone didn’t like overused words, or copyists tried to change texts, inadvertently, or with minor purpose. With all the “against heresies” texts produced by church fathers, I’d say there should have been major motivations to generate large texts supporting their opinions, especially since none were probable eye-witness docs produced in 33AD. So to me, motivations must be evaluated. Not just, gee, I want to sit down and write my version of what might have happened. I think I’ll make a few mods here and there. There has to be a real major motivation to generate “books”, even in the present day. And financial gain for texts didn’t exist 2000 years ago.

• James Dowden

I haven’t had a chance to look at what’s already been done yet, but here are my thoughts.

We know that some Matthaeanisms (most celebratedly “brood of vipers”) have crept into Luke in the double tradition material (although even here a large part of the problem with getting statistically significant results is how thorough we know Luke’s redaction potentially is from his treatment of Mark). It would be interesting to see if there were any words or phrases that could be held to be characteristic of Luke in Matthew’s version of the same material. Harnack’s (Sayings of Jesus, p38f) observations on Lucan redaction might provide some useful subjects for statistical testing, especially:
- Use of more refined terms where there were common vulgar equivalents
- Frequent use of compound verbs
- Use of relative clauses, rather than stringing along conjunctions
- Relative frequency of kai and de
- Use of a more classical range of verb tenses (especially the imperfect and participles)
- Frequency of redundant pronouns
- Frequency of ean
- Frequency of egeneto + finite verb (although I have a good idea of the answer on this one, in round figures)
- Frequency of hn + participle

All these could be good ways of statistically testing whether there’s anything unusual about double tradition material in Matthew. My suspicion is very much that this material looks no different from the rest of Matthew, and so those who hold that Matthew and Q are substantially identical (or even completely identical, which becomes equivalent to the Farrer hypothesis) will claim victory, whilst holders of the two-source hypothesis will claim thorough Matthaean redaction of double tradition material.

I’d also love to see a statistical test of Luke (at least, chapters 3-24) and Acts. My instinct there is that, were it not for the opening verses, the Greek is so clearly different that no-one would assert common authorship.

• http://irrco.wordpress.com/ Ian

Good list. Of these features, these are trivial to check with my current
tagging system, and running the results, it doesn’t look good (though I haven’t done all the stats, I can tell by the raw numbers the confidence will be too low):

- compound verbs
- verb tense distribution
- egeneto + finite
- hn + participle
- kai and de
- ean

of these, the first two were part of my previous test. The problem is that the counts get so low, and we’re probably talking about only tendencies to use one for or another, and even if Q or Farrer is dead right, there is likely to have been some other sources (literary or otherwise) at work, that the signal is hard to track. So you’re right James, on these counts, the double-tradition material is not significantly different from Matt.

As for the Luke/Acts test, that is much harder. Testing two things, is hard without a third to compare against. this was the issue with the Isaiah paper I referred to in my post. Comparing against a known different text is not a good control, imho. From a quick run here, again, on the same criteria I used for the synoptics, Acts is a little more similar to Luke-only than to Matt-only. But again, hardly an earth-shattering result! Happy to go into this a bit more though, if you want.

I can basically run the data with one command if you give me three sets containing any number of verse ranges: A, B and A+B.

If I get a moment, I’ll code the detailed statistical tests in the program so it returns all the data ready processed, that would be even faster to try stuff out.

• James Dowden

Thanks for looking at those features — in a way it’s satisfying when statistics back up the general feel one gets from reading a text. Accepting Q’s existence for a moment, the Q-M coherence poses an interesting challenge to Kloppenborg’s Lucan Order Axiom that underlies the IQP’s (re)construction: he’s essentially asking us to believe that an author who preserved less of a text preserved more of its order. Maybe it’s time for an edition of Q in Matthaean order.

Ah, yes, *that* Isaiah paper. IIRC, the main issues beside the difficulty in finding a control with that were:
1) a horrendously simplistic single split after chapter 39, resulting in comparing one part with multiple authors (over a demonstrably large timespan) with another with at least two (partially overlapping the timespan of the first) — presumably this was done in search of large enough sample sizes, but it’s methodological suicide;
2) a failure to fully take into account the way that later prophetic authors riffed off the themes and words of earlier ones.

(On a total tangent, my favourite argument for Deutero-Isaiah’s separation is that the overlap of 2 Chronicles and Ezra refers to it as Jeremiah. I quite like the idea of the Chronicler’s having an accretion on an unusual scroll, even though it is completely unprovable.)

So for Acts, I can see the third leg is going to be a pain. Luke-Acts-Hebrews might be an interesting test, but I don’t see Hebrews plus one of the others cohering against the third somehow (which is essentially the point of using a known third text). Otherwise it would be all too easy ending up on a wild goose chase around the Apostolic Fathers, expending huge amounts of time proving nothing more than that, say, Polycarp, Hermas, and Luke were all different people.

• http://www.patheos.com/blogs/exploringourmatrix/ James F. McGrath

Significant Matthew-Q agreement could also be compatible with MacDonald’s recent proposal that what we think of as Q is in fact a source which led Papias to conclude that there was an original Hebrew Matthew translated differently by different people (I reviewed his book Two Shipwrecked Gospels here on the blog a while back).

• http://outofthedepths.blogspot.com/ steve

I’m an amateur but when I read the bit from Papias several decades ago it occurred to me that possibly Matthew was the author of these logia which correspond in some degree to Q and that later, when what we know as Matthew incorporated them, it, the more recent work, kept the same attribution.

• James Dowden

Yes, MacDonald is definitely on my to-read list.

• http://irrco.wordpress.com/ Ian

To answer the question in the post, yes that’s basically what we did. Dave looked at single word vocab distributions, I looked at a couple of phrasal indicators including how verb tense and mood were used. There are probably many other possible indicators to look at, but I wouldn’t be too hopeful, and you have to be careful of expected outliers (so if you sample 20 different criteria, you’re likely to get one that has a 95% confidence, just by chance! – cf http://xkcd.com/882/).

The basic idea is to split the material in 3: that unique to A, unique to B, and shared by A and B. Then you tag that material for the features you are interested in, and turn it into three frequency tables, one for each group of materia. Then you can check if the frequency distribution of the common material is more similar to A or B. If A and B are quite different, but A+B is very similar to A, then it is evidence that A wrote the A+B material and B copied it. If A+B is dissimilar to both A and B, then it is evidence of a common source. To get a result you have to show that either pattern is very unlikely to have happened by chance.

This can fail in a few ways. 1) A and B material have few distinctive patterns among the indicators you are looking for, 2) A+B can be similar to A in some places and B in others, 3) there can be too small a sample to get beyond the ‘more likely than chance’ criteria. 4), the result can be somewhere between being very similar to A material, and being dissimilar to both, where you get very little signal.

And that’s basically what happened in both sets of stats where A and B are Matt and Luke: the result wasn’t stunningly clear, and so as an intermediate result, we rely on there being lots of data to clear the ‘more likely than chance’ hurdle. And there simply isn’t enough. I got a fractional win for Farrer in my results, but at the 0.3 level, so there’s a 30% chance you could get that result by picking random words out of a hat! Way, way to high to even be at the level of saying “it favours this hypothesis.”

The intermediate result could be caused by #2, above. But you couldn’t tell, because if you split the material down further, and analysed it in chunks, then there’d be even less data, so the confidence would be lower still.

But as James points out down below, the really interesting questions that require language and historical expertise start from understanding that, for example, a particular analogy is much more likely to be an indicator of idiomatic use than treating all words as one big bag of data. To do this work really well, it would be good to have someone (or many people) go through the Matt, Luke and double tradition material and tag it and weight it more richly (current tagging gives root-word and detailed part of speech data per word — basically the ‘parsed’ view in Logos). Then the algorithm is more likely to pick up on subtler clues. The downside would be that the results would then be open to accusation of subjectivity: we could argue about the weights and the tags given.

So it is weak, but the approach we took was a quick test. If it had come up with a clear result, that would have been a good indicator – it is unlikely to give a strong false positive. But its negative result tells you very little, it doesn’t exclude any hypothesis.

It would be interesting to do more work. I’m happy to help with the math side, if anyone is interested in doing the hard linguistic graft.

• T. Webb

Dr. McGrath, I’ve studied statistics some (for statistical quality control and quality engineering, as well as design of experiments (DOE)), and I’m always very wary of the use of statistics as “proof” of something, because it is very, very easy to run the numbers through some sort of analysis that winds up proving the position you want to prove (which DOE, as you may be aware, is designed to prevent). For example, in the pastoral epistles, there is far too small of a sample size of the words of Paul in the hauptbriefe to provide an adequate sample to compare the pastoral epistles, not even considering issues such as audience, purpose, etc.

Kind of like a little Greek can be dangerous, so can a little stats. Used correctly, it can be valuable, used indiscriminately, well…

CLOSE | X
HIDE | X