Evo: AI Predicts Genes - ππ€―
Tech & Science
Researchers at Stanford University have developed a surprisingly effective approach to predicting proteins and even designing entirely new ones, leveraging the power of artificial intelligence. The core of this innovation lies in training an AI, named Evo, on vast collections of bacterial genomes. The team recognized that the way bacteria organize their genes β clustering genes with related functions together β offers a powerful clue for the AI to grasp. Bacteria frequently group genes involved in a single metabolic process, such as digesting a sugar or creating an amino acid, in close proximity within their DNA, allowing them to control entire pathways efficiently. Evo essentially learns to predict the next DNA base in a sequence, much like a large language model anticipates the next word in a sentence. This ability extends beyond simple prediction; Evo can generate novel DNA sequences, responding to prompts with a degree of creative variation. Notably, Evo demonstrates an intuitive understanding of gene relationships by connecting the details of individual DNA sequences with the broader context of the entire genome. Researchers tested this by prompting Evo with fragments of known protein genes, and in many cases, it successfully completed the sequences, sometimes even producing entirely new proteins. The systemβs capabilities were further demonstrated when they introduced a novel bacterial toxin β loosely related to existing toxins but lacking a known defense mechanism. Remarkably, roughly half of the 10 proteins Evo produced were able to rescue the bacteria from the toxinβs effects, with two completely restoring growth. These newly created antitoxins were distinct from known anti-toxins, sharing only about 25 percent sequence similarity and assembled from as many as 15 to 20 individual proteins. Beyond proteins, the system was tested with an RNA-based inhibitor, and Evo was able to generate DNA encoding RNAs with the correct structural features, even if the specific sequences werenβt closely related to anything previously observed. Further experimentation with inhibitors of the CRISPR system β a bacterial defense against viruses β revealed that Evo could produce 17 proteins that successfully blocked CRISPR function, two of which were so unusual they threw off existing protein structure prediction software. This highlights a fascinating shift: bringing the core of evolutionary innovation down to the level of nucleic acids. The lead researcher on this project, who holds a Ph.D. in Molecular and Cell Biology from Columbia University and previously earned his degree from the University of California, Berkeley, is often found outside of the office β heβs a big fan of cycling, or just enjoying a walk in nature with his hiking boots.