Convergent Evolution and Front-loading

There are many examples of convergent evolution among different life forms. Convergent evolution is simply the independent acquisition of some biological feature among different lineages. For example, both birds and bats have wings (this is an elementary example, of course), but this similarity is not due to common ancestry but instead it is the result of convergent evolution.

So how does convergent evolution relate to front-loading? One of the criticisms of the front-loading hypothesis is that you can’t design a genome in a unicellular organism to evolve specific organs, tissues, biochemical systems, etc., several billion years in the future. But convergent evolution neatly answers this criticism.

A classic example of convergent evolution is the eye in the octopus and the mammalian eye. They are extremely similar, structurally speaking (see the figure, below).

Figure. Source: Ogura et al., 2004.

The human eye and octopus eye both have the following tissues:

1. Eyelids.

2. Cornea.

3. Pupil.

4. Iris.

5. Ciliary muscle.

6. Lens.

7. Retina.

8. Optic nerve.

Furthermore, the arrangement of these parts are practically the same. Cool! And these two systems have arisen independently, through convergent evolution (i.e., they are not related through common descent; see Ogura et al., 2004). This means that these two organs have evolved as a result of the initial state of the last common ancestor of mammals and octopuses. In short, the convergent evolution of these two organs demonstrates that a genome can be programmed to evolve a given objective. If we ran the “clock of life” backwards (to borrow from Stephen J. Gould), human-like eyes would probably appear on the scene once again. In other words, the same system keeps popping up again and again. And this is evidence that a given objective can be front-loaded, starting with a specified initial state. The eye is a beautiful example of convergent evolution, wherein 8 separate “parts” independently came together in the same arrangement to produce the function of vision.

Are there examples of convergent evolution in biochemical systems? If so, this would provide evidence that not only can organs be front-loaded, but so too can biochemical systems. More on this later.


Atsushi Ogura, Kazuho Ikeo, Takashi Gojobori. Comparative Analysis of Gene Expression for Convergent Evolution of Camera Eye Between Octopus and Human. Genome Research, 14: 1555-1561 (2004).

Revisiting the Type III secretion system

Surprisingly, there has been little discussion on a paper recently published in PLOS Genetics. The paper is titled “The Non-Flagellar Type III Secretion System Evolved from the Bacterial Flagellum and Diversified into Host-Cell Adapted Systems.” What’s the big deal about this paper?
A number of ID critics involved in the origins debate have long believed that the bacterial flagellum evolved from the type III secretion system, or that the two systems share a common ancestral system that looked like an export system. For example, the Wikipedia article on the “Evolution of flagella” maintains that:

“All currently known nonflagellar Type III transport systems serve the function of injecting toxin into eukaryotic cells. It is hypothesised that the flagellum evolved from the type three secretory system. For example, the bubonic plague bacterium Yersinia pestis has an organelle assembly very similar to a complex flagellum, except that is missing only a few flagellar mechanisms and functions, such as a needle to inject toxins into other cells. It is also a possibility that the flagellum could have evolved from a currently undiscovered system with similar flagellar traits or a currently extinct organelle/organism.”

Back in the late 90s and early 2000s, you’d see ID critics arguing that the type III secretion system (TTSS) is a plausible precursor to the bacterial flagellum. At the time, some of the literature hinted towards another possibility: the TTSS may have evolved from the bacterial flagellum. This, then, was the response from ID proponents. When Gophna et al. (2003) published their phylogenies which suggested that the two systems shared a common ancestor, the debate took a different turn. It was now argued that the two systems had a common ancestor in the form of a protein export system. A poster on The Panda’s Thumb had this to say about the Gophna et al. paper:

“Clue for the IDers: The TTSS “appearing before the flagellum” is not the only scenario that shatters your flawed and oversimplified arguments about the “designed” origins of the flagellum.”

And in 2006, Nick Matzke said:

“Back in 2003 I was about 55-45 for the idea that the flagellum came first, but Pallen’s parsimony argument and a few additional small points have moved me about 60-40 for the sister groups idea. There are several specific lines of investigation that could clear this up immensely.”

(Note: Mark Pallen’s parsimony argument can be found here: Bioinformatics, genomics and evolution of non-flagellar type-III secretion systems: a Darwinian perspective, 2005)

Since 2003, the debate over the origin of the TTSS has been pretty split, with about half the papers saying that the TTSS descended from the flagellum, and the other half saying that the two systems share a common ancestor.

However, the recently published research provides strong evidence in favor of the hypothesis that the TTSS is derived from the bacterial flagellum. Let’s take a look at this a bit closer.

Here’s a quote from the paper:

“A decade ago, several studies indicated one single phylogenetic split between the flagellum and the NF-T3SS [75], [79], [81], [82]. This is compatible with three different evolutionary scenarios. The two elements might have independent origins from an ancestral system, or one system might have adapted pre-existing structures from the other system for a new function [61], a process referred to as “exaptation” [83]. Understanding the details of the exaptation process requires an understanding of the direction of the evolutionary events. Current sequence databanks cover a much larger fraction of the prokaryotic world than ten years ago. Phylogenetic methods for dealing with multi-protein complexes have also been improved [84], [85], but these newer approaches have not yet been applied to infer the evolutionary history of T3SSs. The ongoing explosion of partially assembled genomes and metagenomes would also benefit from new tools for the detection and analysis of T3SSs from partial data. We have therefore produced such tools and applied them to genome data in order to determine the evolutionary origins and patterns of diversification of T3SSs.”

The phylogenetic analyses of the TTSSs were done in this manner:

1. Maximum-likelihood phylogenetic reconstruction of the ATPase protein that is shared in the flagellum, TTSS, and F- and V- ATPases reveals that the TTSS nests within the flagellar clade. In this phylogeny, the F- and V- ATPases were used as the outgroup, thereby giving the tree a root. Moreover, this tree topology has a statistically significant bootstrap value (84%) in contrast to the two alternative topologies, with 7% and 9% bootstrap support.

2. This result was further supported by a phylogeny built from a larger dataset that included all curated systems.

Also, the study confirmed that the flagellar system has a much broader taxonomic distribution than the TTSS, again suggesting that the flagellar system is more ancient. The researchers also probed into the origin of the TTSS secretin, and found that TTSSs have acquired secretins independently at least 3 times.

Other comments

Some points in this paper that I would like to see improved in future studies:

1. The phylogeny that demonstrated that the TTSS is derived from the flagellum was only built from one protein group: the flagellar, TTSS, and F- and V- ATPases. It should be noted, however, that the TTSS ATPase, SctN, is the most highly conserved TTSS protein (Gophna et al., 2003). Because of the high level of sequence conservation in SctN, and its flagellar homolog, FliI, this protein should be expected to best portray the actual phylogenetic relationship of the flagellum and TTSS.

2. There is no mention of how the results offer a good response to Pallen’s parsimony argument (note: for those interested in the parsimony argument, I would urge you to look it up in Pallen et al.’s paper; in a future post, I may write up a critique of that parsimony argument based on this paper).

This paper is a substantial improvement over Gophna et al., 2003:

1. The study by Gophna et al., which concluded that the TTSS is ancient and did not descend directly from the flagellar system, used four different proteins to construct four different phylogenies. However, their trees were unrooted, which is not good. On the other hand, this study by Abby and Rocha rooted their phylogeny with an outgroup.

2. The bootstrap values for Gophna’s trees are not good, with the exception of one. The Abby and Rocha phylogeny has strong bootstrap support.

3. For phylogeny reconstruction, Gophna et al. used neighbor-joining, which is a distance-based method. Distance-based methods are among the least accurate phylogeny estimation methods, and can lead to incorrect results when lineages diverge at different rates (and it has been suggested that TTSS proteins have diverged faster relative to flagellar proteins). Meanwhile, Abby and Rocha employed maximum-likelihood, which is one of the most accurate phylogeny estimation methods out there.

In conclusion, this study significantly strengthens the hypothesis that the TTSS is derived from the flagellar system. Although we can still expect some of the non-specialists to continue arguing that the TTSS and flagellar system share a common, non-flagellar ancestor, anyone acquainted with the literature should not make such an argument. There is one more important point to make here. If the TTSS descended from the flagellum, then about half of the flagellar proteins lack any precursor homologs. Finally, since the TTSS does not share a common, non-flagellar ancestor with the bacterial flagellum, the TTSS is not evidence that the flagellar system evolved from a primitive secretion system.



1. Abby SS, Rocha EPC (2012) The Non-Flagellar Type III Secretion System Evolved from the Bacterial Flagellum and Diversified into Host-Cell Adapted Systems. PLoS Genet 8(9): e1002983. doi:10.1371/journal.pgen.1002983.

2. Gophna, U., Ron, E., Graur, D., 2003. Bacterial type III secretion systems are ancient and evolved by multiple horizontal-transfer events. Gene 312,151-163.

3. Pallen MJ, Beatson SA, Bailey CM., 2005. Bioinformatics, genomics and evolution of non-flagellar type-III secretion systems: a Darwinian perspective. FEMS Microbiol Rev. 29(2):201-29.

*Note: technically, I should be using the term “NF-T3SS” to indicate which TTSS I’m referring to. In this post, though, read “TTSS” as synonymous with “NF-T3SS.”

Towards a Hypothesis of Molecular Design

Though scholars in the ID movement have continually argued that Darwinian evolutionary processes are insufficient to account for the biochemical complexity that is at the heart of life, relatively little effort has been devoted to developing a testable hypothesis of intelligent design. Yet if intelligent design is to make any significant progress in academia, and if it is to lead to fruitful research, then a positive intelligent design hypothesis is sorely needed. In a previous essay, a testable hypothesis on the engineering of molecular machines was briefly described. Here, I expand on that model of biological intelligent design and discuss several predictions that necessarily follow from the hypothesis. The hypothesis will be termed “molecular design.” The molecular design hypothesis proposes that “the components of molecular machines were engineered through the strategy of rational design, similar to the method humans use to design proteins.” This, however, is only a cursory summary. More specifically, the central thesis of molecular design is that the first biological cells were contrived by engineers with intelligence analogous to our own, and that the protein machinery of these cells was designed by methods similar to our present techniques for protein design. Thus, it suggests that the method or mechanism of the intelligent design of the biochemical complexity of life was essentially rational design and directed evolution. Bear in mind that directed evolution is a very specific method of protein design, and is not synonymous with the step-by-step evolution of molecular machines.
This essay will begin with an overview of rational design and directed evolution in protein design by our species. By exploring these two mechanisms of protein engineering (and more broadly, the engineering of molecular machines), predictions of the molecular design hypothesis can be more fully fleshed out.

Methods in Protein Design: Rational Design and Directed Evolution
Rational design involves either modifying an existing protein sequence in a determined, specified way or designing an entirely novel protein. The latter approach will be discussed later. Below is a figure (Figure 1) that depicts the general procedure behind the rational design of proteins. In rational design, knowledge of the structure and function of the protein is critical. By analyzing the structure and function of the protein, one can predict the effects of changing certain amino acids. For example, changing key amino acids might enable a receptor to bind more tightly with its ligand, increasing functionality.

Figure 1. The basic process behind the rational design of proteins.

Using site-directed mutagenesis, specific amino acid residues can be mutated in the desired way. The process of site-directed mutagenesis is probably familiar to all biology majors, so I am only including a figure (Figure 2) to illustrate how site-directed mutagenesis works.

Figure 2. Site-directed mutagenesis.

Protein sequences can also be designed de novo. That is, a novel protein can be designed “from scratch” instead of by modifying an existing protein. Synthetic DNA sequences are ultimately cloned and expressed, leading to the translation of the de novo protein sequence. There are numerous examples of protein sequences that have been designed de novo (see, e.g., Kuhlman et al., 2003; Fisher et al., 2011). Importantly, protein sequences designed de novo have no homologous counterparts.

Under the heading of “rational design” comes the method of “blob-level” protein design (see Figure 3). As stated in the figure, the main idea behind “blob-level” protein design is “to combine protein units of defined function (domains) to engineer a fusion protein with novel functionality.”

Figure 3. “Blob-level” protein design.

Unlike the rational design technique, directed evolution emphasizes selection to a higher degree. Very simply, a gene sequence encoding a given protein is randomly mutated (through error-prone PCR, for example), which results in a library containing many variants of the sequence. Selection is then utilized to select the sequences which possess the desired function. Next, the selected sequences are amplified through PCR, and the process is repeated as many times as necessary.
The principle methods behind protein engineering, described above, are employed for the design of single protein molecules. But what about molecular machines, which are composed of several (often dozens) of protein parts? How can such multi-component machines be engineered? Designing a protein machine would begin, of course, with planning the arrangement of protein parts (and the kinds of proteins that would be needed) such that function is produced. After this has been accomplished, the following steps would be carried out:

a. The inner, “core” proteins of the machine would be engineered first (through rational design and directed evolution). De novo design of the first core protein would be followed by the design of a protein that could tightly bind to it. Alternatively, a protein from another machine would be borrowed so that de novo design would not be necessary.

b. More and more proteins would be designed and added to the initial core proteins.

c. Once the genes encoding the necessary proteins have been designed, genes regulating the assembly of the machine would be engineered.

After the machine is designed, it can be modified and changed to produce a machine with a different function. These methods, then, are the major techniques behind protein engineering. Several salient points emerge that are worth mentioning:
• In rational design of molecular machines, protein parts are often borrowed from other systems (and modified as necessary to produce optimal functionality). This would result in statistically significant levels of sequence similarity between the components of the machine and other proteins that are not part of the machine.
• Proteins that are designed de novo share no statistically significant sequence similarity with other proteins (in general).
• When it comes to designing molecular machines, there is no step-by-step, cumulative evolution, wherein every step offers a selective advantage. Instead, the components of the machine are integrated at approximately the same time. Millions of years (and not even decades) are not needed to engineer a molecular machine with dozens of components. This is because the engineering approach has foresight.
• Individual parts of proteins (domains) can be swapped around to design specific functions.

Tests of the Molecular Design Hypothesis

What predictions naturally follow from the molecular design hypothesis? Let us suppose that we find a biological machine X, and that some of its parts are similar with a machine Y. Under current theory, this similarity is attributed to a shared ancestry. Yet it is this very similarity that can act as a “springboard” for testing the molecular design hypothesis. The bacterial flagellum is probably the most familiar icon of the intelligent design debate, so it will be used as an example in the predictions discussed below.

Prediction 1: Molecular clock analyses of protein sequences of the components of the machine should reveal a specific pattern. Under the Darwinian model, the evolution of a machine like the flagellum proceeds through the stepwise co-option of parts. For example, the flagellar ATPase would be borrowed from cellular F-ATPases and integrated with a primitive membrane pore. Next, after the evolution of gated proteins, etc., proton channels (proto-ExbB/proto-ExbD) would be co-opted to form the flagellar motor proteins MotA and MotB, respectively. A proto-MgtE copy would then associate with the flagellar motor, resulting in a FliG protein. Thus, under the Darwinian model of the origin of the flagellum, molecular clock analyses of the flagellum-specific ATPase and the F-ATPase, MotAB and ExbBD, and FliG and MgtE protein sequences should show divergence times in the following chronology: the split between the flagellum-specific ATPase and the F-ATPase –> the MotAB/ExbBD divergence –> and the FliG/MgtE split, where the arrows denote the passing of time.

In contrast, the molecular design hypothesis predicts that the divergence times of the machine parts and their “homologs” should follow a different pattern. One might expect that the divergence times should be approximately equal, indicating that the parts of the machine originated simultaneously, but this would ignore the fact that modification of protein parts would often be necessary. As a simple example, it is clear that simply plugging the ExbB/ExbD proteins into the flagellum for motor purposes would not be optimal. ExbB and ExbD interact with a membrane protein known as TonB, so ExbB and ExbD have segments unrelated to flagellar function. If these segments were not altered in the right ways, then they could interfere with the function of the flagellum. This engineered modification of “homologous” components must be taken into account. As I explained earlier:

“…this means that we cannot logically predict – from a design perspective – that molecular clocks will demonstrate that all machine components originated at about the same time. This is because if some proteins are modified more substantially than others, it would confuse the molecular clock. Protein components that have undergone more drastic modifications will have the appearance of being more ancient (using a molecular clock), while proteins that are only slightly changed will appear to have originated more recently. In particular, the design hypothesis predicts that molecular clocks will show that proteins with rapid substitution rates will have a later origin, while proteins with slow substitution rates will have an early origin. We can summarize this prediction in this manner: in general, the slower the substitution rate, the more ancient the protein will appear to be. If a protein has a slow substitution rate, then any modifications to the sequence of that protein will give the appearance of a large amount of time passing by. In contrast, even fairly extensive modifications to a protein with a rapid substitution rate will not significantly affect the molecular clock. To further refine this prediction, we can take into account the amount of modification that would be needed for a given protein.” (Included here is the figure that was provided in the previous essay)

Figure 4. A summary of the prediction of the molecular design hypothesis.

How can we make an estimate of the amount of engineered modification needed for any given component? This can be accomplished by taking a holistic approach to the protein parts and systems in question. That is, by analyzing the functions, structures, and interactions of the proteins and systems, one can infer approximately how much modification would be needed to incorporate a borrowed part into an engineered machine. A close inspection of the flagellar rod proteins, for example (which are homologous to each other), reveals that only a minimal amount of engineering would be needed to modify one rod protein into another. Their functions and interactions are similar, as is their cellular localization. Relative to the flagellar rod proteins, a great deal of modification would be needed to turn an MgtE copy into FliG. MgtE is a magnesium transporter and is not part of a multi-component protein system, while FliG interacts with the flagellar MS-ring and motor, as well as with FliM/N.

Figure 4 summarizes the prediction of the molecular design hypothesis discussed above.

Prediction 2: Molecular clock analyses of synonymous sites in the proteins under consideration should demonstrate approximately equal divergence times. While engineers might find it necessary to modify a protein part borrowed from another system, this modification would take place at the amino acid level. However, molecular clock analyses are not restricted to amino acid sequences. Clock analyses can also be conducted on the synonymous sites of different two gene sequences. Since the synonymous sites would not be affected by any engineered modifications (given that the engineered changes would be done on the amino acid level), then the divergence times of machine parts as estimated from synonymous sites should be approximately equal. Furthermore, clock analyses carried out on the basis of synonymous sites should be in disagreement with clock analyses performed on non-synonymous sites.

An example may be cited here. Suppose the bacterial flagellum was engineered. As such, FliG was adapted from MgtE and integrated into the flagellar system, and the flagellar ATPase was borrowed (and modified as necessary) from the F-ATPase. Through molecular clock analyses, the divergence times of FliG/MgtE and the flagellum-specific ATPase/F-ATPase could be calculated. The divergence times would be expected to match the prediction described in “Prediction 1.” However, keep in mind that, in reality, these parts are being engineered into the flagellum at approximately the same time. So if we could find a molecular clock method that is independent of functional requirements, it would be possible to determine if these parts truly did originate at the same time. Fortunately, a molecular clock method based on synonymous sites provides such a method. Since the synonymous sites are not modified (unless their source organisms are significantly different) by any engineering methods, clock analyses of synonymous sites should give the “actual” divergence times of these proteins, and those divergence times should be nearly equal.

In the example above, then, all synonymous sites of the MgtE and FliG genes would be taken into consideration, and the synonymous substitution rate calculated. Next, the number of synonymous substitutions between MgtE and FliG would be determined, and thereby the divergence times of the two proteins could be established. This same procedure would be employed for the flagellar ATPase and the F-ATPase. The molecular design hypothesis predicts that the calculated divergence times of these two pairs of proteins – based on synonymous sites – should be approximately the same. Treatments of the methods and techniques behind molecular clock analyses will be found in molecular evolution and bioinformatics textbooks; these should be consulted if the reader is interested in a further understanding of the subject.

I will now describe two predictions of the molecular design hypothesis that arise from special cases.

Prediction 3: Fusion proteins. Fusion proteins are proteins that are composed of two or more fused proteins – proteins that originally functioned independently. Recombinant technology is widely used to create fusion proteins, and fusion proteins can also arise through random mutations. If a fusion protein is a component in a biological machine, this provides an opportunity to test the molecular design hypothesis. In Figure 5, protein C is a protein component in a molecular machine hypothesized to have been engineered. Protein C is a fusion protein: the red portion is similar to protein A, and the green portion is similar to protein B. To engineer protein C, these two proteins (A and B) must be fused together. But this is not always the whole picture. For the fusion protein C to function in the context of the engineered machine, modification to proteins A and B would be done. In other words, the protein sequence of protein A would have to be tweaked in just the right way so that it fits nicely with the rest of the machine, and the same applies to protein B. There is one more element to this, however. It is not likely that both proteins A and B would have to be tweaked to the same degree, since these are, after all, proteins with different functions. So the sequence of protein A, for example, is modified more substantially than protein B. The two proteins are then fused using recombinant DNA technology, and the resulting protein is integrated into the engineered machine. The molecular clock starts ticking for each of the two domains in the fusion protein (the red and green portions). This is where the prediction stems from. Based on the molecular design hypothesis, I would predict that molecular clock analyses of the different parts of the fusion protein (that is, molecular clock analyses of each domain in the fusion protein with its homologous counterpart) should yield different divergence times. The logic of this prediction is simple: (a) the parts A and B were modified to different degrees, skewing the molecular clock, and making one appear more ancient, (b) the molecular clock then starts ticking (once they have been fused and the machine has been deployed in the “wild”), with the fusion protein accumulating substitutions. Since the original modification to the proteins A and B skewed the molecular clock, the molecular design hypothesis predicts that in general, molecular clock analyses of the parts of a fusion protein should show one of the parts to be more ancient than the other.

Figure 5. A fusion protein engineered from two different proteins. The asterisks represent the amount of modification that has been done to each protein part, as well as the subsequent accumulation of substitutions.

Here it should be emphasized that the Darwinian theory of the origin of molecular machines leads to a different prediction. Within the evolutionary framework, proteins A and B simply fuse, and at that moment the molecular clock starts ticking for the novel fusion protein. The prediction would be that the different parts of the fusion protein diverged at the same time from their respective homologs.

Prediction 4: Protein domains. This prediction concerns duplicate protein domains that carry out the same function in the same protein. Suppose we have a protein component, A, which consists of the domains B, B, and C. The two B domains serve the same function in protein A, and are homologous to a domain in another protein (this “ancestral” domain will be termed B1). Since the two B domains function in the same way, an equal amount of modification would be done to them (if any at all). From an engineering perspective, the two domains would be placed in protein A at the same time. They would subsequently diverge. When compared to B1, then, they should be genetically equidistant. Therefore, molecular design predicts that domains with the same function in a protein should be equidistant from their “ancestral” domain.

Again, this is not predicted if a molecular machine is the product of evolutionary processes. Duplicate domains need not arise simultaneously. Instead, a domain can be incorporated in a protein, and only later a second domain of the same type might be integrated with the protein. It is true, of course, that duplicate domains can evolve at the same time. But this is not a prediction of current theory. Current theory would be able to explain the above observation, but it would not predict it.


I have endeavored to describe the molecular design hypothesis in greater depth than in the previous essay. Several predictions of the hypothesis were delineated; there are undoubtedly more that were not discussed here. The next step from here is to actually test the hypothesis. This can be accomplished using standard bioinformatics techniques. By experimentally determining if these predictions are met it will be possible to detect artificiality – or the lack thereof – in molecular machines. Having said this, I wish to strongly encourage intelligent design proponents to do more than simply critiquing the current view on biological origins. A mechanistic model of intelligent design is what is needed, and this is what I have attempted to outline.



Kuhlman, B., Dantas, G., Ireton, G.C., Varani, G., Stoddard, B.L., Baker, D., 2003. Design of a novel globular protein fold with atomic-level accuracy. Science 302(5649), 1364-8.

Fisher, M.A., McKinley, K.L., Bradley, L.H., Viola, S.R., Hecht, M.H., 2011. De novo designed proteins from a library of artificial sequences function in Escherichia coli and enable cell growth. PLoS One 6(1), e15364. doi: 10.1371/journal.pone.0015364.

Nature’s Engines and Engineering

Nature’s Engines and Engineering


In the past few decades, extensive biochemical research has revealed the cell to possess a remarkable array of molecular machines, from flagella to replisomes to ATP synthases. These machines are machines in a very real sense: they are composed of discrete protein components that interact with each other because of some input (e.g., energy in the form of ATP), thereby producing biological function. Indeed, the only real difference between molecular machines in the cell and man-made machines is that the former self-assemble.

How did these biological molecular machines originate? It is thought by most biologists that these machines evolved through a Darwinian pathway, with pre-cursor protein components being co-opted into new roles and associating with other proteins, gradually adding to the complexity of the system. Gene duplication, scaffolding, and other mechanisms would also play a pivotal role in the origin of these machines. Yet we can take another approach to this and hypothesize that some of the molecular machines in the cell are the products of engineering, and thus planning is behind their origin. This is the position of many intelligent design (ID) proponents, such as Michael Behe. Unfortunately, however, the ID community, for the most part, has contented itself with merely attacking the Darwinian explanation instead of developing a working design hypothesis. In short, although we see much material on why molecular machines could not have plausibly evolved, we see precious little on how they could have been designed. This is a potentially fatal flaw in the ID movement, because if it is to convince the scientific community at large that certain biological systems were engineered, then a testable design hypothesis is needed. This, in turn, would allow predictions to be made, and the model could thereby be falsified or confirmed. Formulating a novel design hypothesis is not easy, for it involves looking at current data in a new light, and one that would generate testable predictions. Nevertheless, in this short essay I have endeavored to lay out my ideas for a working design hypothesis on the engineering of molecular machines.

Background Considerations

The intelligent design/evolution discussion has somewhat ignored the historical nature of biological origins. By this I mean that ID proponents have focused on demonstrating that biological system X could not have evolved through Darwinian mechanisms, instead of asking the simple question: did biological system X actually evolve or was it intelligently designed? In other words, the discussion over biological origins has essentially become a question of plausibility, rather than a question of what actually happened. Biological system X could plausibly evolve but this does not mean that it did. The human mind is quite capable of imagining very creative non-teleological scenarios for the origin of any biological system, and we have to take this into account when considering the origin of a given biological system. A statement of plausibility says little about what actually happened in the history of a system, and thus independent evidence is needed to support any conclusion, be it non-teleological or teleological. We need to emphasize the historical nature of biological origins and instead of endlessly arguing over the plausibility (or lack thereof) of evolutionary mechanisms, we should try to determine what actually happened in the past.

There is one more point I wish to discuss before moving on. Intelligent design of molecular machines can be accomplished through direct design and through indirect design, which is front-loading. If a molecular machine is front-loaded, then it has a planned origin, but the design is indirect in that evolution is used to carry out the design objective. On the other hand, if a molecular machine is designed through direct engineering – e.g., through the de novo design of protein molecules – then we have an example of direct design. The hypothesis I will describe here is one of direct design.

The Design Hypothesis

As stated previously, biological machines are assembled from protein components. I propose that the components of molecular machines were engineered through the strategy of rational design, similar to the method humans use to design proteins. This, then, is the mechanism behind the construction of molecular machines (in this essay, whenever the term “design hypothesis” is used, I am referring exclusively to the above concept). Naturally, at the fundamental level, protein design is carried out through the intelligent manipulation of DNA sequences. Our current technology already allows the design of novel protein folds using computational methods (see, e.g., [1]). A key aspect of protein design is the modification of an already existing structure/sequence – if a protein structure can be modified such that it possesses a new function, an entirely novel protein fold does not need to be designed. Thus, under the design hypothesis described here, similarities among components in different biological machines is the result of a basic protein structure/sequence being re-used in different contexts. An example may be used here to clarify the above statement.

Consider the bacterial flagellum. A number of its protein components share significant similarity with non-flagellar proteins. For example, FliG is similar to MgtE [2], a magnesium transporter. Under the Darwinian model, this similarity is attributed to common descent: in the distant past, an MgtE copy was co-opted into the primitive bacterial flagellum and evolved into FliG. However, if the bacterial flagellar components were engineered, then this similarity is the result of MgtE being re-designed into FliG. More specifically, the protein sequence of MgtE would be tweaked in just the right way such that it would acquire the specific properties necessary for functioning in the bacterial flagellum. At first glance, all of this might seem obvious and possibly ad hoc. Yet it is this part of the hypothesis that I think is the most readily testable. To explore why this might be the case, we need to first take a look at things from a Darwinian perspective.

Molecular Clocks and the Evolution of Biological Machines

How can we distinguish actual homology among components from similarities that are the result of re-engineering a basic component for use in different systems? I suggest that the answer lies in molecular clocks. Molecular clocks allow us to estimate the time of divergence between protein/DNA sequences. For example, a molecular clock using cytochrome c sequences indicates that mammals and reptiles diverged approximately 300 million years ago [3], which correlates well with the fossil evidence. To understand how molecular clocks are useful for detecting engineering in molecular machine components, let us return to the bacterial flagellum.

In 2003, Nicholas Matzke proposed an evolutionary pathway for the origin of the bacterial flagellum [4]. This model begins with a passive pore being converted to an active pore through the association of the pore with an ATP synthase complex. It proposes that an ATP synthase was co-opted in toto early in the evolution of the flagellum. This was followed by a number of co-option events, such as the co-option of the Tol-Pal system which evolved into the MotAB complex. Next, MgtE was integrated into the evolving flagellar system such that it eventually gave rise to FliG. Naturally, this is only a summary of some of the steps involved in Matzke’s scenario. What we see is that various ATP synthase proteins share similarities with the following flagellar components: FliH (similar to the ATP synthase components AtpFH), FliI (AtpD), and FliJ. Given that Matzke’s scenario involves the in toto co-option of an ATP synthase, from an evolutionary point of view we would expect that FliH, FliI, and FliJ all diverged from their ATP synthase homologs at the same time. We could test this expectation through the use of molecular clocks. Moreover, we would also predict that MotAB diverged from the Tol-Pal components after the divergence of FliHIJ from the ATP synthase proteins. Finally, molecular clocks should show that FliG split from MgtE after the FliHIJ/ATP synthase and Tol-Pal/MotAB divergences. Thus, if molecular clocks confirmed that this specific sequence of events occurred, the evolutionary hypothesis for the origin of the flagellum would be significantly strengthened, and furthermore, the design hypothesis would be considerably weakened. This is because the design hypothesis explains the similarities of flagellar parts with non-flagellar components as the result of re-engineering rather than common descent – and if these similarities are indeed the result of re-engineering we would not expect to see the specific sequence of divergence times for flagellar components and their homologs that we would predict under the evolutionary hypothesis. Thus, although we cannot tell the difference between re-engineering and common descent of a particular component – e.g., FliG – when looking at this in a broader context, and taking into consideration the divergence times of different components, we can in fact establish if the similarity among components is most likely the result of common descent. In brief, the evolutionary model for the origin of the flagellum makes a precise prediction regarding the pattern of divergence times for specific flagellar proteins and their non-flagellar counterparts (Figure 1). I suggest that the design hypothesis yields a different prediction, and one that the evolutionary model does not make.

Figure 1. The prediction of the evolutionary hypothesis regarding divergence times of flagellar components from their homologs. The red arrow represents the “flow” of time. We see that the evolutionary hypothesis predicts that FliHIJ originated prior to either MotAB or FliG. MotAB arose after FliHIJ but before FliG. Finally, FliG formed after FliHIJ and MotAB.

For this prediction that stems from the design hypothesis we must look again to molecular clocks, but this time with an interesting twist.

Molecular Clocks and the Design Hypothesis

Contrary to the Darwinian model, the design hypothesis explains the similarities of machine components to other proteins as the result of re-using a protein in different contexts. Now, if molecular machine components were engineered, what would molecular clocks tell us about the divergence times of the machine components and their analogs (note: since homology, by definition, refers to common descent, I am using the term “analog” when dealing with the design hypothesis; i.e., from a design perspective, FliG and MgtE are not homologs, but analogs)? It is important to understand that direct design involves directly engineering the components and assembling the machine, such that all components originate at the same time. This is entirely unlike the evolutionary model, wherein components originate and associate with each other in a step-by-step, gradual pathway over a comparatively long timeframe. At first glance, then, it would seem like the design hypothesis predicts that molecular clocks would show that all machine components arose at approximately the same time. But things are not so simple. It would be rare to re-use a protein in different functions but not modify the protein’s sequence and structure. For example, the amino acid sequence of MgtE would have to be tweaked until it could be integrated into the flagellar system. Without any modifications to MgtE, it is unlikely that it could function properly in the context of the flagellum. FliG is a highly specific protein, interacting with the MS-ring and the MotAB complex. Thus, re-using MgtE in the flagellum would almost certainly require that its sequence be modified. The same is true for MotAB and the Tol-Pal proteins. The Tol-Pal system does not rotate other protein complexes while MotAB is a key player in rotating the flagellar filament. As such, the Tol-Pal proteins would have to be re-engineered before they could be incorporated into the flagellum. All of this means that we cannot logically predict – from a design perspective – that molecular clocks will demonstrate that all machine components originated at about the same time. This is because if some proteins are modified more substantially than others, it would confuse the molecular clock. Protein components that have undergone more drastic modifications will have the appearance of being more ancient (using a molecular clock), while proteins that are only slightly changed will appear to have originated more recently. In particular, the design hypothesis predicts that molecular clocks will show that proteins with rapid substitution rates will have a later origin, while proteins with slow substitution rates will have an early origin. We can summarize this prediction in this manner: in general, the slower the substitution rate, the more ancient the protein will appear to be. If a protein has a slow substitution rate, then any modifications to the sequence of that protein will give the appearance of a large amount of time passing by. In contrast, even fairly extensive modifications to a protein with a rapid substitution rate will not significantly affect the molecular clock. To further refine this prediction, we can take into account the amount of modification that would be needed for a given protein (Figure 2).

Figure 2. Summary of the predictions of the design hypothesis described here.

For example, if protein X has a slow substitution rate, but we deduce that its analog would have to be significantly changed before it could function as protein X, then we would predict that it has an early-origin – according to molecular clocks (keep in mind that, under the design hypothesis, the components of the machine actually originated at the same time). A discussion on how we could determine the amount of re-engineering that would be necessary is beyond the scope of this essay, but such a task could probably be relatively easily accomplished.


Here, I have discussed a possible mechanism for biological intelligent design, and one that presents us with a falsifiable hypothesis. An important assumption of this hypothesis is that the engineer(s) were rational agents. Without this basic premise, we cannot make any predictions because one could argue that the designers were purposefully trying to deceive us and tampered with any evidence of their involvement. But if we ensure that only rational agents are part of our hypothesis, we can make testable predictions. Naturally, this assumption must go both ways – if we encounter irrationality in a biological system, this must count against the design hypothesis.



1. Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D., 2003. Design of a novel globular protein fold with atomic-level accuracy. Science. 302(5649):1364-8.

2. Pallen, M.J., Matzke, N.J., 2006. From the Origin of Species to the origin of bacterial flagella. Nat Rev Microbiol. 4, 784-790.

3. Dickerson, R.E., 1971. Sequence and structure homologies in bacterial and mammalian-type cytochromes. J Mol Biol. 57, 1-15.

4. Matzke, N.J., 2003. Evolution in (Brownian) space: a model for the origin of the bacterial flagellum, TalkDesign.

On the Hippo signaling pathway

Here’s the abstract of a fairly new paper published by Cell Reports (“Premetazoan Origin of the Hippo Signaling Pathway”):

“Nonaggregative multicellularity requires strict control of cell number. The Hippo signaling pathway coordinates cell proliferation and apoptosis and is a central regulator of organ size in animals. Recent studies have shown the presence of key members of the Hippo pathway in nonbilaterian animals, but failed to identify this pathway outside Metazoa. Through comparative analyses of recently sequenced holozoan genomes, we show that Hippo pathway components, such as the kinases Hippo and Warts, the coactivator Yorkie, and the transcription factor Scalloped, were already present in the unicellular ancestors of animals. Remarkably, functional analysis of Hippo components of the amoeboid holozoan Capsaspora owczarzaki, performed in Drosophila melanogaster, demonstrate that the growth-regulatory activity of the Hippo pathway is conserved in this unicellular lineage. Our findings show that the Hippo pathway evolved well before the origin of Metazoa and highlight the importance of Hippo signaling as a key developmental mechanism predating the origin of Metazoa.”

This is interesting, especially from a front-loading perspective. The Hippo signaling pathway is an important developmental mechanism in Metazoa (animals), but all the core components of the Hippo signaling pathway have recently been found in unicellular Holozoa, which include the choanoflagellates.

A figure from the paper “Premetazoan Origin of the Hippo Signaling Pathway.”

Thus, the following Hippo pathway components have been found in unicellular organisms:

1) Hippo (kinase)

2) Warts (kinase)

3) Yorkie (coactivator)

4) Scalloped (transcription factor)

Specifically, these components have been found in Capsaspora owczarzaki. If you take a look at the above figure, you will see that this unicellular lineage is deeper-branching than the choanoflagellates. So, what are the Hippo components doing in unicellular organisms that don’t need them? This really isn’t expected from non-teleological evolution, but we’d expect this from front-loading. What’s very neat, too, is that the researchers discovered that these Hippo pathway components in Capsaspora owczarzaki can actually function in Drosophila. This is quite surprising from a non-telic viewpoint, because there’s no reason why these proteins in unicellular organisms should have the right sequence specificity to function in a very different multi-cellular organism like Drosophila. But it makes sense under the front-loading hypothesis, because we’d predict these proteins (more specifically, their ancestors) to be given a function that would conserve their sequence identity very well, such that when animals did appear on the scene, these components could be easily co-opted into a Metazoan role.

The Bacterial Flagellum and Homology

The Bacterial Flagellum and Homology

In this brief analysis, I’m going to discuss the bacterial flagellum and the homology a number of its components share with non-flagellar proteins. The below table is a list of flagellar proteins found in the genus Salmonella, their length in terms of amino acid residues, and their homologs (if any). Protein lengths were taken from UniProt, and the data on homologs were taken from Pallen and Matzke’s 2006 paper in Nature Reviews Microbiology, “From the Origin of Species to the origin of bacterial flagella.”

Flagellar Protein Length (amino acid) Homology with non-flagellar proteins?
FlgA 219 Yes (CpaB)
FlgBCFG 138; 134; 251; 260 No; but homology with FlgBCEFGK
FlgD 232 No
FlgE 403 No; but homology with FlgBCFGK
FlgH 221 No
FlgI 367 No
FlgJ 316 No
FlgK 553 No; but homology with FlgBCEFG
FlgL 317 No; but homology with FliC
FlgM 97 No
FlgN 140 No
FlhA 692 Yes (YscV)
FlhB 383 Yes (YscU)
FlhDC 113; 192 Yes (other activators)
FlhE 130 No
FliA 239 Yes (RpoD, RpoH, RpoS)
FliB 401 No
FliC 200 Yes; homology with FlgL and EspA
FliD 467 No
FliE 104 No
FliF 579 Yes (YscJ)
FliG 331 Yes (MgtE)
FliH 235 Yes (YscL; AtpFH)
FliI 456 Yes (YscN; AtpD; Rho)
FliJ 147 Yes (YscO)
FliK 405 Yes (YscP)
FliL 155 No
FliM 334 Yes (FliN; YscQ)
FliN 137 Yes (FliM; YscQ)
FliO 125 No
FliP 245 Yes (YscR)
FliQ 89 Yes (YscS)
FliR 264 Yes (YscT)
FliS 135 No
FliT 122 No
FliZ 183 No
MotA 295 Yes (ExbB; TolQ)
MotB 309 Yes (ExbD; TolR; OmpA)

When these figures are added up, we get a total of 12,322 amino acid residues. Thus, it appears that Salmonella flagella are composed of roughly 12,322 amino acid residues. What percent of the Salmonella flagellum, in terms of amino acid residues, has absolutely no known homologs? A total of 3,195 amino acid residues belong to proteins in the flagellum that have no known homologs. This means that approximately 25.9% of the Salmonella flagellum lacks sequence homology.  Now, you will notice that a number of flagellar proteins only have homologs in the type III secretion system. However, the type III secretion system (TTSS) is not a pre-cursor system to the bacterial flagellum. It probably evolved directly from the flagellar export system (do note that Gophna et al. 2003 are a dissenting view, but in my humble opinion, the evidence is certainly in favor of the hypothesis that the type III secretion system evolved from flagella). So we can ask the question: what percent of the flagellum lacks homologs or only has homologs in the TTSS, which is not a pre-cursor system to the flagellum? A total of 2,804 amino acid residues only share sequence homology with TTSS components. This is added to 3,195, to get 5,999. Thus, approximately 48.7% of the Salmonella flagellum has no known homologs in systems that would pre-date the flagellum.  Finally, we ask the question: what percent of the flagellum have no known homologs in non-flagellar systems? Note that a number of flagellar proteins only share homology with other flagellar proteins and TTSS components. For example, FliM is homologous to FliN and YscQ. FliN is only homologous to FliM and YscQ. Since YscQ could not be a pre-cursor protein, one of these proteins do not share homology with a pre-cursor protein. If FliM is supposed to be a pre-cursor to FliN, then the homology FliN shares with FliM cannot be evidence that FliM descended through non-teleological evolution.  To arrive at a percent of the flagellum that has no homologs that provide evidence of a non-telic origin of the flagellum, in cases like FliM/FliN we will use the shorter protein. This will allow us to be as fair as possible to the non-telic position. We arrive at a total of 471 amino acid residues. Add this to 5,999 and about 52.5% of the Salmonella flagellum has no homologs that provide evidence of a non-teleological origin.


Several flagellar proteins only share structural similarity with other proteins. However, structural similarity can often be the result of convergent evolution – there are only a few thousand different protein folds, contrasted with trillions of different possible amino acid sequences.  Further, in some instances, sequence similarity can also be the result of convergent evolution.

From this brief analysis in this article, I found that more than half of the Salmonella flagellum, in terms of amino acid residues, lack any homologs that provide evidence that it evolved through non-teleological mechanisms.  Some of the remaining homologs can hardly be called significant. The flagellar protein, FliG, shares only about 20% sequence similarity with its only homolog, MgtE.  Also, from the angle of intelligent intervention, where the flagellum was designed at the dawn of life, the remaining proteins it does share fairly significant sequence similarity with could possibly be explained by convergent evolution. I suggest that convergent evolution at the molecular level may be more pervasive than many think.

Is Intelligent Design Dead?

Is Intelligent Design Dead?

Recently, a number of critics of intelligent design have said that the intelligent design movement is, well, dead. It all started with Jason Rosenhouse’s blog post “Twenty Years After Darwin on Trial, ID is Dead.” Jerry Coyne followed up on his blog, Why Evolution Is True, agreeing with Rosenhouse’s assesment that the intelligent design movement is effectively dead. So, what’s my opinion on this?

First of all, I’d like to point out that the concept of teleology in biology isn’t dead. It’s been around for a very long time, and that concept isn’t exactly the same thing as the “Intelligent Design Movement.”

That said, back in the days when ID first came out (in the 1990s – I wasn’t an advocate of intelligent design at the time, so I’m talking from the stand point of looking back at the history of the intelligent design movement, not my own personal experience), it seemed like there were a lot of creative, original, inspired ideas going around from the ID side. “We were great,” one could say. We challenged the scientific consensus and proposed research ideas, interesting hypotheses, and explored the various ways that ID could be used to further the advance of the biological sciences. Now, however – and any unbiased observer could see this – many ID proponents are spending their time attacking Darwinian theory, instead of spending their time developing a rigorous intelligent design hypothesis that could truly make robust predictions about the living world. When I look at the posts over at UncommonDescent, it is obvious to me that a change needs to occur within the mainstream ID team. At least 50% of the posts at UncommonDescent aren’t even remotely relevant to ID and biological origins (well, I suppose by some stretch of the imagination they could be just a tiny, tiny bit relevant). Consider the title of one of the posts at UncommonDescent: “Survey results: Only 5.3% of general philosophers of science accept or lean towards theism.” Now what on earth does theism have to do with biological intelligent design? I really don’t know. Does theism have anything to do with the theory of gravity? Not really, and if the mainstream proponents of ID are genuinely interested in developing a rigorous hypothesis of biological origins, then the theistic language will have to be dropped (or at least minimized). I mean, c’mon, UncommonDescent spends a whole bunch of time devoted to attacking atheism, promoting theism, etc. But what does this have to do with the origin of the bacterial flagellum, for example? If you’re truly interested in biological origins, then we don’t need to be sidetracked by the theism/atheism debate, which is a whole other topic.

There is, of course, the other side of the coin. Papers friendly to ID have been published by academic journals, and researchers like Doug Axe and Ann Gauger are doing some pretty cool stuff in the lab. Much of this research is devoted to discovering the limits to random mutation and natural selection, and without a rigorous intelligent design hypothesis this is the most we can expect at this point. Then there are the folks over at Telic Thoughts, which don’t get as sidetracked as UncommonDescent. The folks over there seem to be genuinely interested in biological origins, and I am glad of their often thought-provoking musings over biological origins. So, too, we have Mike Gene and his blog The Design Matrix, and here again is an example of someone sincerely interested in biological origins. Both at Telic Thoughts and The Design Matrix, we don’t get bogged down with the theism/atheism debate, or with “Neuroscientists study how mindfulness meditation helps people overcome temptation to smoke.” I am thankful for the efforts of the folks at Telic Thoughts and for Mike Gene’s The Design Matrix. These blogs represent what ID as a whole could be if the bulk of its proponents had the interest in developing ID as a rigorous biological hypothesis, instead of talking how the fossil record disproves Darwinian evolution.

In conclusion, ID is not dead, but a number of aspects of ID seem to be in decay these days. The days of the enthusiastic ID proponents thinking about research ideas seem to be waning, thanks to the efforts of UncommonDescent and the like (okay, I know I’ve been picking on you UD guys a lot). The mainstream ID community seems to be content with merely poking holes at Darwinian evolution, which is not good at all. We need to develop a rigorous biological design hypothesis that makes real, robust predictions about the living world. And for this I thank Mike Gene and the folks at Telic Thoughts for pondering over the front-loading hypothesis – a teleological hypothesis that makes real predictions. You guys still retain that creativity and sincerity and that feeling of how ID should be. And I strongly encourage those researchers like Doug Axe and Ann Gauger et al. to continue their lab work – it’s certainly needed these days. I for one, am quite interested in pursuing ID, developing it as a rigorous hypothesis, and researching the predictions made by telic hypotheses like front-loading.

No, ID isn’t dead. But it’s a bit hard to tell these days if it’s the calm before the storm or not. We need more Mike Genes, more “telic thoughts” (is that a pun or not?) and more ID researchers determined to build ID as a rigorous biotic hypothesis.