When researchers first unveiled Evo, a generative AI trained on vast collections of bacterial genomes, the scientific community took note — but also recognized a fundamental limitation. Bacterial genomes are relatively compact and logically organized, clustering related genes in ways that make pattern recognition tractable. The more intricate, sprawling architecture of eukaryotic genomes — those belonging to organisms with complex cells, including humans — represented an entirely different problem. As observers noted at the time, it was far from certain that the same approach would scale.
The team behind Evo apparently took that skepticism as an invitation. Their response is Evo 2, a fully open-source AI system trained across all three domains of life: bacteria, archaea, and eukaryotes. Having processed trillions of base pairs of DNA, the system has independently developed internal representations of sophisticated genomic features — including regulatory sequences and splice sites — that even experienced human researchers find difficult to identify consistently.
To appreciate why this is significant, it helps to understand what makes eukaryotic genomes so difficult to interpret. Unlike bacterial genomes, where coding sequences are contiguous and regulatory systems are compact, eukaryotic genes are riddled with introns — non-coding interruptions embedded within genes themselves. Regulatory elements can be scattered across hundreds of thousands of base pairs. The sequence signals that define critical boundaries, such as the edges of introns or the binding locations of regulatory proteins, are statistically fuzzy rather than sharply defined — a given position might favor a specific nucleotide only 45 percent of the time. Beyond all of this lies an enormous volume of DNA historically labeled as junk: remnants of defunct viruses, damaged genes, and other genomic detritus.

This structural complexity has long frustrated computational analysis. Specialized tools exist for tasks such as splice-site identification, but their error rates become consequential at the scale of a 3-billion-base genome. Evolutionary comparison — examining sequences conserved across species — offers additional insight, but carries its own limitations, particularly when the goal is to understand variation rather than conservation. Neural networks, which excel at detecting subtle statistical patterns invisible to human analysts, represent a promising alternative — provided they can be given sufficient data and computational resources to work with.
The architectural foundation of Evo 2 is a convolutional neural network called StripedHyena 2. Training proceeded in two phases: an initial stage that exposed the system to genomic sequences approximately 8,000 bases in length, emphasizing regions rich in functional features, followed by a second stage processing sequences of one million bases at a time — enabling the model to internalize large-scale structural patterns. The training corpus, designated OpenGenome2, comprises 8.8 trillion bases drawn from all three domains of life, as well as bacteriophages. Notably, eukaryote-infecting viruses were deliberately excluded out of concern for potential misuse. Two model variants were produced: a 7-billion-parameter version trained on 2.4 trillion bases, and a full 40-billion-parameter version trained on the complete OpenGenome2 dataset.
The rationale underlying this training strategy is elegant in its simplicity. Sequences that perform critical biological functions tend to be preserved across evolutionary time, meaning that a model trained on enough genomic data will encounter them repeatedly and in diverse contexts.
"By learning the likelihood of sequences across vast evolutionary datasets, biological sequence models capture conserved sequence patterns that often reflect functional importance. These constraints allow the models to perform zero-shot prediction without any task-specific fine-tuning or supervision," the researchers write.Critically, the decision to forego fine-tuning on known genomic features was deliberate — it preserves the system's capacity to recognize atypical or entirely unknown features that curated training examples might inadvertently suppress.
To probe what Evo 2 had actually learned, the research team employed a secondary neural network trained to interpret the internal firing patterns of Evo 2 itself. The results were revealing. The system had clearly developed representations of protein-coding regions, intron boundaries, and structural protein features such as alpha helices and beta sheets. It could also detect mutations that disrupt coding sequences, assign greater significance to those introducing stop signals than to synonymous substitutions, and recognize mobile genetic elements — essentially, DNA-level parasites — as a distinct category.
Performance on practical genomic tasks was equally impressive. Evo 2 identified mutations disrupting transcription initiation sites and translation start sites with accuracy, while also recognizing the functional significance of RNA sequences that are never translated into protein at all. It demonstrated an ability to infer which organism a sequence originated from — including species that use non-standard genetic codes with alternative stop-codon signals — and applied the appropriate interpretive framework accordingly. On splice-site identification, by certain metrics, Evo 2 outperformed software built specifically for that task. When evaluated against a catalog of mutations in the BRCA2 gene — many of which carry cancer-associated risk — the model performed competitively, with performance improving further upon targeted fine-tuning with known variant data.
The generative capabilities of Evo 2 are more difficult to assess, but no less intriguing. When prompted with yeast genomic sequences, the model produced outputs containing functional RNA elements, regulatory information, and sequences with recognizable gene-like structure including splice sites. However, whether any of the putative proteins encoded in those outputs are functionally active remains untested. The challenge is a conceptual one: in bacteria, gene clustering provides a reasonable basis for inferring function; in eukaryotes, no such assumption holds, making it unclear what functional tests would even be appropriate.
A more structured generative test involved asking Evo 2 to design regulatory DNA sequences active in one cell type but not another, given information about sequences active in each. The generated sequences were then experimentally introduced into the relevant cell types. The results, while technically meaningful, were modest: only 17 percent of the generated sequences showed activity differing by a factor of two or more between the two cell types. This represents a genuine achievement, but falls well short of the kind of precision that would be required for therapeutic or industrial design applications.
The research team has made the entire system publicly available. As the paper states: "We have made Evo 2 fully open, including model parameters, training code, inference code, and the OpenGenome2 dataset." This decision reflects both scientific generosity and strategic intent — the team appears to recognize that the most valuable applications of Evo 2 may emerge from the broader research community rather than from within their own laboratory.
Several compelling directions present themselves for future investigation. One involves developing specialized derivatives of Evo 2 — models fine-tuned for tasks such as cancer genome analysis or automated annotation of newly sequenced organisms. Another, perhaps more profound, concerns what Evo 2 may have learned that humans do not yet know how to look for. Decades of research have already produced a succession of unexpected genomic discoveries — CRISPR repeats, microRNAs, and others. The possibility that Evo 2 has independently identified structural features of the genome not yet characterized by conventional biology is not merely speculative; the interpretability tools used to probe the model could, in principle, be applied to surface exactly such discoveries.
Given that Evo 2 was published fewer than four months after the original Evo paper, the current limitations in biological validation are understandable. Experimental biology is inherently time-consuming, and the most informative experiments are not always obvious in advance. The scientific community will likely require months to years to determine where Evo 2 delivers its most consequential contributions — whether in genome annotation, variant interpretation, sequence design, or some domain not yet anticipated.




