MIT scientists debut a generative AI model that could create molecules addressing hard-to-treat diseases

Nov 26, 2025 | AI

On Thursday, October 30, a BoltzGen seminar, organized by the Abdul Latif Jameel Clinic for Machine Learning in Health (MIT Jameel Clinic), attracted a substantial audience of over 300 professionals from academia and industry. The event spotlighted MIT PhD student Hannes Stärk, the lead author of BoltzGen, who presented his work merely days after its initial public unveiling.

Making its official debut on Sunday, Oct. 26, **BoltzGen** represents a significant advancement in biomolecular structure prediction. Building upon its predecessor, the open-source **Boltz-2** – which garnered considerable attention this past summer for its ability to predict protein binding affinity – BoltzGen is the first model of its kind to go a step further. This innovative system can *generate* novel protein binders that are immediately ready for integration into the drug discovery pipeline.

BoltzGen’s capabilities are driven by three pivotal innovations. Firstly, the system showcases remarkable versatility, seamlessly unifying protein design and structure prediction while consistently delivering cutting-edge performance. Secondly, BoltzGen incorporates meticulously designed built-in constraints, developed with direct input from wet lab collaborators. These ensure the model produces functional proteins that strictly adhere to fundamental laws of physics and chemistry. Finally, a rigorous evaluation process tests BoltzGen against notoriously “undruggable” disease targets, thereby pushing the boundaries of its binder generation potential.

Current protein modeling techniques, prevalent in both academic research and industrial applications, typically specialize in either predicting protein structures or designing new proteins. A notable limitation, however, is their tendency to generate proteins that successfully bind only to “easy” or well-characterized biological targets.

Much like students who excel at test questions resembling their homework, these models perform optimally when the target in question closely mirrors their training data during the binder design process. Yet, a critical flaw in current evaluation practices is that these methods are almost always benchmarked against targets for which successful binders are already known. This leads to a significant drop in performance when these same models are applied to more challenging or novel targets.

According to Stärk, current computational approaches to binder design are hampered by a critical limitation: they are “modality-specific,” meaning they are tailored to a single type of data or interaction.

The development of a general model, Stärk argues, offers dual advantages. Beyond simply enabling the tackling of a broader spectrum of tasks, it significantly enhances performance even on individual tasks. This improvement stems from the nature of how these models learn: the emulation of physical phenomena is an example-driven process. Consequently, a more generalized training regimen provides a richer, more diverse dataset of examples, fostering the recognition and application of truly universal physical patterns.

The BoltzGen research team conducted an extensive and rigorous validation process, putting their system through its paces across 26 distinct targets. This comprehensive evaluation spanned a diverse spectrum, including cases with immediate therapeutic relevance alongside scenarios specifically chosen for their stark dissimilarity to the original training data, thereby demonstrating BoltzGen’s robustness and broad applicability.

A rigorous validation process, spanning eight specialized wet laboratories in both academic and industrial environments, has conclusively demonstrated the model’s broad applicability and its significant promise for accelerating breakthrough drug development.

Parabilis Medicines, an industry collaborator that conducted wetlab testing of BoltzGen, lauded the platform’s significant potential. The company stated its belief that integrating BoltzGen into its existing Helicon peptide computational platform promises to substantially accelerate the development of transformative drugs for major human diseases.

The open-source releases of Boltz-1, Boltz-2, and the recently unveiled BoltzGen — which debuted at the 7th Molecular Machine Learning Conference on October 22 — are poised to introduce significant opportunities and unprecedented transparency into drug development. However, these advancements simultaneously signal a potential imperative for the biotechnology and pharmaceutical sectors to critically reassess their current product and service offerings.

Amid a flurry of social media buzz surrounding BoltzGen, Justin Grace, a principal machine learning scientist at LabGenius, has raised a significant question regarding the long-term viability of specialized AI services.

Grace pointed to the rapidly diminishing performance gap between proprietary and open-source AI systems, noting that for conversational AI, this “private-to-open” lag currently stands at seven months and is consistently shrinking. He further observed that this interval appears even shorter within the protein engineering domain.

This trend, Grace argued, prompts a critical financial query for “binder-as-a-service” companies: How can they effectively recoup their substantial investments when users can predictably wait just a few months for high-performing, free alternatives?

BoltzGen is set to dramatically expand and accelerate scientific potential within academic circles. MIT Professor Regina Barzilay, a senior co-author and AI faculty lead for the Jameel Clinic, often discusses AI’s transformative capacity in drug discovery with her students.

“A question that my students often ask me is, ‘where can AI change the therapeutics game?’” Barzilay, who is also an affiliate of the Computer Science and Artificial Intelligence Laboratory (CSAIL), notes. Her conviction is clear: “Unless we identify undruggable targets and propose a solution, we won’t be changing the game.” She further emphasizes that this focus on “unsolved problems” is the crucial distinction that sets Hannes’ research apart from other endeavors in the field.

The widespread availability of fully open-source models, such as BoltzGen, is a crucial catalyst for advancing drug design capabilities, asserts senior co-author Tommi Jaakkola. Jaakkola, the Thomas Siebel Professor of Electrical Engineering and Computer Science, affiliated with the Jameel Clinic and CSAIL, emphasizes that these accessible tools empower broader community-wide collaboration, significantly accelerating the pace of pharmaceutical development.

Stärk anticipates a revolutionary shift in biomolecular design, fundamentally reshaped by the capabilities of AI models. His ambition is to engineer sophisticated tools designed to both manipulate biological systems for disease eradication and empower molecular machines to execute tasks currently beyond human conceptualization. Ultimately, Stärk aims to equip biologists with these advanced instruments, unlocking their capacity to envision and explore unprecedented possibilities within the field.

Related Articles