Key Takeaways

Sample Definition And Size

This is a commentary article by Eric J. Topol, not an empirical study; it does not involve a defined sample or sample size.

Study Type

Commentary / Expert opinion piece.

Conflicts Of Interest

No competing interests or conflicts of interest are declared in the article. (No statement of competing interests is present.)

Results Summary

The article reviews recent advances in large language models applied to molecular biology, including AlphaFold 2’s prediction of over 200 million protein structures, Evo trained on 2.7 million phage and prokaryotic genomes (~300 billion nucleotides), AlphaFold 3 achieving 80% of protein–ligand complex predictions within 2 Å of experimental error, and other models such as Boltz‑1, MassiveFold, EVOLVEpro, PocketGen, PIONEER, AbMAP, RhoFold, RhoDesign, GET, DNA language models, MethylGPT, CpGPT, SyntheMol, SCimilarity, and multiagent systems like Virtual Lab. No statistical results (p‑values, effect sizes, confidence intervals) are provided, as this is a narrative overview.

Abstract

In 2021, a year before ChatGPT took the world by storm amid the excitement about generative artificial intelligence (AI), AlphaFold 2 cracked the 50-year-old protein-folding problem, predicting three-dimensional (3D) structures for more than 200 million proteins from their amino acid sequences. This accomplishment was a precursor to an unprecedented burgeoning of large language models (LLMs) in the life sciences. That was just the beginning. In recent months, we have moved into a hyperaccelerated phase of new foundation models, pretrained on massive datasets, with the ability to perform a wide range of tasks that are helping us understand the structure, biology, evolution, and design of proteins, RNA, DNA, and ligands, as well as their biomolecular interactions. Unlike multimodal LLMs such as GPT-4, Gemini, and Claude, which process text, audio, and images, these large language of life models (LLLMs) are multiomic. That is to say, they are not only multimodal but pertain to different layers of molecular biology. For example, Evo, a foundation model trained on 2.7 million diverse phage and prokaryotic genomes (equivalent to about 300 billion DNA nucleotides), predicts the impact of variants in DNA, RNA, or proteins on structure and function, as well as how essential genes are to cell function, and can generate new DNA sequences.

Referenced In

William Fan

Mar 7, 2026 2:43 AM

It's crazy hearing about the advances in biomedical industry via AI like alphafold and EchoJEPA. It feels like we're hitting an exponential point in breakthroughs, to the point where its unimaginable how we're going to progress in 10 years.