Although the human genome was sequenced back in 2000, deciphering what that sequence actually means is an ongoing challenge. Much of the genome is so-called ‘junk’, whose purpose (if it has any) is still unknown. Identifying real genes in the rest of it is not always a straightforward task – genes that are only rarely or transiently expressed can be particularly difficult to find. So while some 20,000 have already been discovered, there are likely to be others still lurking hidden in the genome.
This month, a team lead by Adam Siepel at Cornell have used a novel strategy to unearth around 300 previously-unknown genes. They began by looking at sequences of the human genome that are similar to sequences found in other mammals (mouse, rat and chicken). Then they took into account the fact that changes in functionally important sequences are constrained by evolution – unlike changes in non-functional sequences, which tend to be fairly random.
“What’s exciting is using evolution to identify these genes,” Siepel said. “Evolution has been doing this experiment for millions of years. The computer is our microscope to observe the results.”
This was no simple task. An 850-node supercomputer ran the algorithms that identified the regions of overlap that were evolutionarily conserved. Once the potential genes were identified, it was back to the laboratory to test whether they actually exist and could be expressed in humans. The new genes they found were mostly involved in motor activity, cell adhesion, connective tissue and central nervous system development.