Scientists are currently witnessing a shift in biology that mirrors the arrival of ChatGPT in the tech world. Large language models are no longer just writing poetry or code; they are writing the code of life itself. New generative AI models can now predict and generate viable gene sequences for proteins that have never existed in nature. This technology promises to revolutionize how we develop drugs, create sustainable materials, and combat environmental pollution.
To understand how this works, you have to look at biology as a language. In a standard language model, the AI learns that “how are” is usually followed by “you.” In biology, the “words” are amino acids, and the “sentences” are proteins.
For billions of years, evolution has written these sentences. Now, artificial intelligence is learning the grammar.
Researchers at Salesforce Research developed a model called ProGen. They trained this AI on 280 million protein sequences from genomic databases. Much like an AI learns English by reading Wikipedia, ProGen learned the rules of biology by analyzing real-world DNA data. The result is a system that can generate entirely new amino acid sequences that fold into functional proteins, even though those specific sequences do not appear in any known organism.
For a long time, the holy grail of computational biology was predicting structure. DeepMind’s AlphaFold solved a massive part of this problem by predicting the 3D shape of a protein based on its genetic sequence.
Generative AI takes this a step further. Instead of just predicting the shape of an existing sequence, it works backward or from scratch. It asks: “If we want a protein shaped like this to perform that function, what DNA sequence do we need to build it?”
Two primary methods are currently driving this field:
The theoretical ability to write DNA is useful, but it only matters if the resulting proteins actually work in the real world. Recent results prove they do.
In a landmark study published in Nature Biotechnology, the Salesforce team used ProGen to design artificial enzymes. They specifically targeted lysozymes, which are proteins found in saliva and egg whites that attack bacteria walls.
The results were concrete and startling:
These were not just copies of existing lysozymes. Some of the AI-generated enzymes differed from natural proteins by as much as 30% in their amino acid sequence, yet they still folded correctly and functioned perfectly. This suggests the AI didn’t just memorize biological data; it learned the underlying logic of how proteins are built.
The ability to generate functional gene sequences opens the door to “de novo” protein design. This means creating proteins from scratch rather than tweaking what nature has already provided.
Nature provides a limited toolkit for fighting disease. Generative AI expands this toolkit.
Biological manufacturing is becoming a reality. We are moving away from petrochemicals and toward enzymes that can function in harsh industrial conditions.
It is important to understand that the AI does not physically print the protein. It provides the instructions. The workflow generally looks like this:
The power to generate novel DNA sequences comes with significant risks. If an AI can design a protein to cure a disease, it could theoretically design a toxin or a virus evasion protein.
This dual-use dilemma is a major focus for the industry. Companies involved in DNA synthesis, such as Twist Bioscience and Ginkgo Bioworks, screen orders to ensure they are not printing genetic material that matches known pathogens or toxins.
Furthermore, leading labs are advocating for “KYC” (Know Your Customer) regulations in biology, similar to banking. This ensures that only verified researchers can order the physical DNA for AI-generated sequences.
What is the difference between AlphaFold and ProGen? AlphaFold is primarily a structure prediction tool. You give it a sequence, and it tells you what shape it will take. ProGen and similar generative models are creators. You give them a desired function or constraints, and they write the new sequence for you.
Can AI generate DNA for entire organisms? Not yet. Currently, these models focus on individual proteins or small complexes. Generating the entire genome for a bacteria or an animal involves millions or billions of base pairs and complex regulatory networks that current models cannot yet handle.
Are these AI-generated proteins safe? In a lab setting, they are handled with standard biosafety protocols. The proteins themselves are not “alive” inside the computer; they only become biological agents when the DNA is synthesized and placed into cells. Stringent screening processes exist to prevent the creation of harmful biological agents.
Who owns the rights to AI-generated proteins? This is a developing legal area. Generally, the patent goes to the researchers who defined the problem and validated the protein, but patent offices are currently debating how much human input is required for a patent to be valid when AI does the heavy lifting of the design.