I am (on occasion) asked about design and ideas surrounding design. Some questions are in areas where I have no relevant training and so only a lay understanding of the issues.
Fortunately, I can point to friends who help me understand the ideas around design inferences. The following is a guest post by Dr. Eric Holloway.
Dr. Eric Holloway received a solid grounding in classical education at the Torrey Honors Institute at Biola University. Eric continued schooling to complete a MSc in Computer Science at the Air Force Institute of Technology and a PhD in Computer Engineering at Baylor University.
When I first began to look into intelligent design (ID) theory while I was considering becoming an atheist, I was struck by Dr. Bill Dembski’s claim that ID could be demonstrated mathematically through information theory. A number of authors who were experts in computer science and information theory disagreed with Dembski’s argument. They said he did not provide enough details to make his argument coherent, and that he was making claims that were at odds with established information theory. I pressed a number of them in online discussions, such as Dr. Jeffrey Shallit, Dr. Tom English, Dr. Joe Felsenstein, and Dr. Joshua Swamidass, and read a number of their articles, but have not been able to discover a precise reason why they think Dembski is wrong. Ironically, I found they actually tended to agree with Dembski when within their respective realms of expertise. For example, Dr. Shallit considered an idea in his rebuttal which is very similar to the ID concept of “algorithmic specified complexity”. They tended to pronounce criticism when addressing Dembski’s claims outside their realms of expertise.
To better understand intelligent design’s relationship to information theory, and hopefully get to the root of the controversy, I spent two and a half years studying information theory and associated topics during a PhD with one of Dembski’s co-authors, Dr. Robert Marks. I expected to get some clarity on the theorems that would contradict Dembski’s argument. Instead, I found the opposite.
The two primary approaches to information theory are Shannon information and Kolmogorov complexity. The canonical example is the amount of information in a sequence of fair coin flips. Shannon’s theory says there is nothing distinctive about a sequence that is all heads compared to a sequence of more evenly distributed heads and tails that is garbled. Both have an equal probability and thus equal information content. On the other hand, Kolmogorov’s theory distinguishes between the two sequences based on their description lengths. The sequence that is all heads has a much shorter description length than the evenly distributed garbled sequence. The latter sequence is considered a typical random sequence in Kolmogorov’s theory.
Kolmogorov’s approach to information theory is motivated by Laplace’s observation that we do not assign equal probability to all patterns:
“if heads comes up a hundred times in a row, then this appears to us extraordinary, because the almost infinite number of combinations that can arise in a hundred throws are divided in regular sequences, or those in which we observe a rule that is easy to grasp, and in irregular sequences, that are incomparably more numerous.” (Laplace, 1951: 16–7)
Highly compressible patterns are more rare than incompressible patterns given a fair coin flip, and high enough compressibility means another explanation is more plausible than a fair coin flip.
What I found interesting is this is the same line of reasoning applied by Dembski when writing “The Explanatory Filter” and it was motivated by similar considerations of how to distinguish a typical random sequence from non-random sequences after the fact. This is in contrast to standard Fisherian hypothesis testing, which requires that any hypotheses to be tested must be stated before the experiment is performed. Fisher did not provide a way for patterns to be detected after the fact. However, Kolmogorov’s theory of information shows that we can detect patterns after the fact, because concisely describable sequences are rarer than sequences requiring lengthy descriptions, which is used by Ray Solomonoff in combination with Bayes theorem to derive a mathematical formalization of Occam’s razor.Additionally, I found Demsbki’s key indicator of intelligent design, “complex specified information (CSI)”, is a more refined form of the information theory concept “mutual information,” with the additional constraint that the specification random variable is independent from the described event. This additional constraint results in the second keystone of intelligent design theory: the conservation of information.
Dembski proved that searching for a good search algorithm is no easier than performing the search for the primary target in the first place. The implication is that there is no shortcut for naturalistic processes to produce information from chaos, or to increase the amount of existing information, and thus a natural process such as the various versions of evolution cannot be said to create information.
While there is a similar theorem in computer science called the “no free lunch theorem (NFLT),” which states all search algorithms’ performance is identical when averaged across all possible problems, I initially thought that is where ID’s similarity to established theory would end. Instead, I discovered there were a couple conservation theorems similar to the NFLT in information theory.
In Shannon’s information theory there is the data processing inequality, which states that processing data does not increase information regarding its origin any more than was already contained in the data. In Kolmogorov’s information theory there is Leonid Levin’s law of independence preservation, which makes a similar statement that no sequence of deterministic or random processing can increase the algorithmic mutual information between independently specified bitstrings. In addition, there are a number of variations on these conservation laws and related quantities such as the Kullback-Liebler distance, that show determinism and randomness are incapable of creating CSI.
I also found the standard literature supported other hypotheses relevant to ID. A common criticism is that ID is silent about the nature of the designer. Since the conservation laws of information theory show deterministic and random processes cannot create mutual information with an independent target variable, then the information that does exist must have been produced by some cause that is not deterministic or random, or a combination thereof. In computer science we learn that all computation is some combination of deterministic and random processing, so the same information conservation laws apply to computation. Computer science also describes a cause that can violate the information conservation laws, and this cause is known as a halting oracle. The field of hypercomputation studies halting oracles and related concepts. While by definition a halting oracle cannot be defined mechanistically, its behavior is empirically distinguishable from computation. Gregory Chaitin, one of the inventors of algorithmic information theory, uses a halting oracle as the source of creativity in his mathematical model of evolution .
Another area of criticism is lack of practical application. One straightforward application is that since intelligence can create information, and computation cannot, human interaction will improve computational performance. Addressing this observation there is a growing field known as “human computation” which investigates whether human-in-the-loop computation is more effective than a purely computational approach. It turns out the answer is yes. There are numerous tasks that humans find trivial, but are extremely difficult or impossible for algorithms to perform. This phenomenon is known as Moravec’s paradox. Combining human and computational approaches allow the computers to “amplify” the humans’ capabilities, and vice versa. The big tech companies such as Microsoft, Google, Facebook and Amazon all use forms of human computation to power their search and recommendation algorithms. Incidentally, a number of artificial intelligence companies have been caught faking their AI with human workers posing as bots.
After the years of study, what I have found is that rather than demonstrating Dembski’s theory was at odds with established information theory, my studies show Dembski’s theory is very much in line with well-known theorems. This left me with the question as to why there was so much controversy around Dembski’s theory in the first place. I have still not been able to answer this question, but whatever the cause of the controversy, it is not lack of theoretical and practical justification.
Thanks to David Nemati for the careful proofreading of the original manuscript.
Laplace, P. S., & Simon, P. (1951). A philosophical essay on probabilities, translated from the 6th French edition by Frederick Wilson Truscott and Frederick Lincoln Emory.