Algorithmythicism At #AARSBL19

Continuing my recap of the Digital Humanities session I chaired in November. Let me begin with links to three updates on Claire Clivaz’s Mark 16 project (highlighted in my first post from this session). The most pressing is probably the one with news and upcoming events.

Daniel Stökl Ben Ezra of the Ecole Pratique des Hautes Etudes and Hayim Lapin of the University of Maryland – College Park presented on “Automatic Transcriptions of Medieval Hebrew Manuscripts and Crowdsourcing Their Corrections.” They mentioned that it is unfortunate SBL does not allow submissions that include authors that are not present. In the natural sciences it is different. Astronomers have been living in this world for half a century. When it comes to digital rendering of Hebrew texts, we need to have data points at the level of the stroke, letter, etc.

(See Patrick Sahle’s text wheel.) This project is at the intersection of several other projects: eRabbinica, Sofer Mahir (automatic transcription of Tannaitic treatises), Tikkoun Sofrim (automatic transcription of Tanhuma), BiblIA, Scripta numerique. Their approach allows fast transcription of huge amounts of data. They were originally at 3 errors per 100 characters, and are now at 1.8. Computers and humans can both include extra space, other things not needed, because of not knowing the language and thus not knowing what is significant. Computer can learn to recognize letters extended to fill the line! Some alphabets have curved, straight, and oblique lines. The initial OCR had 20,000 mistakes. It turns out that the general public can be helpful in eliminating them. Questions I had included: Can the correctors help the software learn to do better? Risk of “correctors” who do not know language making software worse at recognition? Do large numbers of participants in such a project offset negative impact of any individual who is incompetent or malicious?

The presenters doubted this could be done with Latin scripts in Israel. The number of users grew to 373, and the number of lines checked by the most active users was 11,552. The hyperengaged are 10-20% of the total, and they do 80-90% of the work. Almost 90,000 transcribed lines resulted from this project. They noted there were fewer users on the sabbath. News coverage caused peaks. System malfunction also led to low points in activity. The quality of crowd corrections was pretty good for those who were experienced. The character error ratio was .5, typically special characters. Users improved over time. They mentioned collatEx. A text service must provide instructions about how it should be cited, machine and human readable. The project is aiming to be open source. Another question I had was whether users transcribe accurately even if a text is grammatically wrong?

A. Marshall King V of the University of Notre Dame presented a test case on Deuteronomy exploring big data, intertextuality, and the Bible. This represents a personal project with no funding, working on tools to investigate shared citation. Looking at existing sources such as Biblia Patristica and Biblindex, what they do they do well, but some questions were not answered by the configurations of the data provided there. For instance there was no help with weighing, i.e. evaluation of which verses have been cited most in a time period. Other interesting questions that digitized texts ought to help answer but do not currently are: Which groups cite particular verses most? Which verses are cited by the largest number of authors? Which verse is most cited? Which of the ten commandments is most cited? It takes “years of analysis for a day of synthesis.” King wondered why has it taken so long for network theory to be applied to this, specifically Willem Vorster’s work on network of traces.

This was where the question occurred to me: Could an AI calculate a probability that something is an allusion to an earlier text?

The raw frequency with which commandments are cited by later authors is fascinating. Coveting is the most mentioned. Honoring parents is cited by the most authors. The parting of the ways and the different ways of delineating the commandments affect things. Searching for citation of Deut. 32 yields very different results among Jewish and Christian authors. This project relied on existing lists of citations. But in theory one might automatically detect citation.

Benjamin White from Clemson University presented on the idea of “dirty texts” and how it relates to stylometric analysis. Stylometry, for those not familiar with it, is the attempt to quantify the writing style of an author such as Paul, with a view to determining the authenticity or actual authorship of letters. The previous year White tackled whether named coauthors might have impacted the stylome. It turned out that there is no reason to hang multiple authorship over stylometric analysis. Inclusion of Timothy doesn’t change the stylome. He talked about the work Paul Duwalla of Duquesne University has done on canonization and the idea of a pure text, and what Maciej Eder has done on the “dirty corpus.” The work of the Computational Stylistics Group was mentioned. In this year’s presentation White focused on clustering of tests and signal interference through scribal transmission process. This represents a form of metacriticism of the use of stylometric analysis to determine authenticity. We don’t always consider the way using different manuscripts, or an eclectic critical edition, could potentially produce different results in a study like this. On the other hand, when classical speeches are analyzed, even random replacement of 40% of letters did not prevent 70% accuracy being achieved, suggesting that this computer-assisted method should still work in principle, and work better than human judgment.

In the case of Paul’s letters, however, one has to also account for the very different results that have emerged from past stylometric analysis by computer scientists, mathematicians, and NT scholars of Pauline epistles. Ironically, the consensus of scholars, that there is a corpus of seven letters that is the most likely to be authentic, has the least stylistic support! Some of those who undertook studies in the past have used the Textus Receptus, some used Nestle Aland, and White wondered whether this could be the cause of the divergences. (P46 could not be included because it is incomplete.) The Stylo package lets you run a variety of tests at once. It then provides a depiction of the average result. One can choose the increments used. Most frequent word vectors ranged from 10-50, avoiding the accusation of cherry-picking results you like from among different outcomes. White showed us his Codex Sinaiticus cluster analysis, talking about a “Bootstrap Consensus Tree.” Comparing the results from Sinaiticus vs. Family 35 we could clearly see the cluster differences represented visually. 1 Timothy and Titus align differently in the two results! Word frequency changes between the manuscripts, but sometimes they merely flip places. As words become more infrequent, they skip around in the ranking a lot more. White focused a lot on function words, as it necessary to try to get the content and subject of a particular text not to determine the result, since the same author often wrote about many different subjects, by definition using different vocabulary when doing so.

Looking at weighting tables and network analysis of connections, similarities, and degrees of difference, a key question is which texts are most centrally related to the group. Interestingly, 2 Timothy emerged as the middle of the network, then Philippians, then 2 Corinthians. If one adds other early Christian literature, similar patterns emerge as the network expands.

The key takeaway result is that the Pauline epistles are not especially dirty. They fall within the margin of error for the industry standard for stylometric analysis, regardless of which manuscripts are used. The study has nonetheless helped make clear what has led to past discrepancies. Hugh Houghton said he would have preferred that a modern critical edition be used because of the scribal tendency towards harmonization. Another key point to take away is that to get 80% correct attribution, you need a minimum number of words (2000), and that cuts out half the Pauline epistles. And so, on the one hand, excuses sometimes offered for ignoring stylometric analysis are invalid. On the other hand, the method applied to much of the Pauline corpus is by definition invalid.

The connections of this part of the session with mythicism include not only the connection with evaluating authenticity, something that mythicists handle in sweeping generalizations and other ham-fisted ways, but also the use of technology to spot intertextual connections. My evaluation of Thomas Brodie’s work, in which he argues that the Gospels were created by taking earlier texts and transforming them in ways that can and are supposed to be detected, is that it is intertextuality and parallelomania run amok. A machine could find even more parallels than I did in my article on this topic. And so on the one hand a machine might be able to quantify the differences and similarities with respect not only to vocabulary but also structure in ways that might be useful. But it would also find parallels where it will make more sense to interpret them as due to the fact that human vocabulary involves a limited number of words, and they are used in predictable ways by us all, and sometimes converge on the same word choice even in ways that are unpredictable and even striking but in no reasonable scenario can be due to direct borrowing. The use of computer technology in this area has great promise, as long as we remember what we need to remember about machine learning and algorithmic processes in general, namely that they offer us information which it is still up to us to use wisely. At least at the present, our machines have intelligence but do not possess wisdom. Thankfully some humans do.

In an exciting coincidence, after I wrote this post but before it was scheduled to appear, an article appeared on precisely the topic I have been discussing: “On the Feasibility of Automated Detection of Allusive Text Reuse.”

And of course, don’t miss my earlier post about this session at SBL 2019 if you haven’t already read it:

#DH @ #AARSBL19

Algorithmythicism at #AARSBL19