Blog contents:
1. The project
2. Codon Optimization, why?
3. The principles of C.O
4. Proteinea's CodonSP
5. Case study
6. What's next
The project
Proteinea is constantly developing different computational tools and internal pipelines to enhance our development projects, creating new innovative capabilities and solutions. We use our A.I based models to overcome several limitations across the development of a project, we are not limited by our protein engineering platform, and are constantly looking to enhance & accelerate every stage of a development project with our technology. CodonSP was born out of this thinking, it is a tool developed by us to increase yield of our research proteins, we've decided to share and commercialize the tool due to the amazing results we have seen
Codon Optimization, Why?
As established in molecular biology, it all starts with the genetic code of life, DNA. A DNA gene encodes information, which is transcribed into mRNA followed by translation into the respective protein (DNA → mRNA → Protein). However, a given protein can be encoded by a variety of DNA sequences, and not all of them are equally efficient. This means that some coding sequences can express a certain protein better and faster than others. This is particularly a problem when a protein of interest is meant to be expressed in a heterologous host other than its native one, where some codons are not easily readable by the non-native expression host leading to low production yields. And that is exactly where the codon optimization concept comes into action, ensuring that the best possible DNA coding sequence of the protein of interest has been designed for the chosen host, to guarantee the highest possible yield.
The principles of C.O
Each protein consists of a chain of amino acids, called a polypeptide chain. Each amino acid (AA) in the chain is encoded by three successive DNA nucleotides known as "codon". As a fact, a single AA can be encoded by several codons, known as “synonymous codons”. Some synonymous codons, however, are favored by a host organism (e.g. bacteria, yeast, insects, or human cells) over others. For example, the amino acid proline (P) is encoded by four synonymous codons: CCG, CCC, CCT, and CCA. While the codon CCC is the most preferred by human cells, the bacteria E. coli in contrast favors the codon CCG. Accordingly, the codon optimization algorithms select the most usable codons for the protein sequence based on the expression host of choice.
In addition to the codon usability explained above, other critical factors play a decisive role in the protein expression. These include: mRNA secondary structures, GC content, mRNA destabilizing motifs, etc. Therefore, modern codon optimization tools take all these factors into consideration
“There is major room for innovation beyond rational codon optimization"
Although the attempt to produce high protein yields by changing codon assignments has resulted in the widespread usage of DNA codon optimization, several studies have revealed that synonymous codon alterations via the rational approach might have adverse implications [refs]. This is because the native DNA code includes several layers of information that overlay the amino acid sequence, and this “natural” complexity can be disturbed by rational codon optimization methods. For example, the ribosomal translation of a mRNA should accelerate and decelerate at certain regions to ensure the correct protein folding. As a result of violating such natural information, implications such as protein conformation and stability alteration, as well as changed sites of post-translational modifications have been observed. These changes ultimately affect protein function.
Moreover, certain possible concerns are linked with the use of rational codon optimization for the production of recombinant therapeutic proteins, such as the formation of anti-drug antibodies, which can impair therapeutic effectiveness and trigger allergic responses. Therefore, there is a critical need for alternative and safer codon optimization approaches that overcome the drawbacks of rational methods. Proteinea is here to fulfill this need with its cutting-edge Artificial Intelligence (AI) technology.
Proteinea's CodonSP
We are revolutionizing the codon optimization concept for protein production, with no more trade-offs between protein quantity and quality. Our state-of-the-art AI-powered sequence design algorithm, CodonSP, decodes and utilizes the ever-complex biological information for designing novel sequences that produce exceptionally high protein yields while preserving their natural conformation and function. Unlike the rational approach that is restricted by the limited available knowledge, our CodonSP AI-based model goes far beyond these limits by deeply deciphering the biological complexity and writing new optimization patterns inspired by biology. Guided by deep learning from mother nature, our algorithms effectively identify the golden balance between the multiple optimization parameters, which lead to the maximum protein expression rates without affecting the encrypted folding patterns
Unlike the rational approach that is restricted by the limited available knowledge, our CodonSP AI-based model goes far beyond these limits by deeply deciphering the biological complexity and writing new optimization patterns inspired by biology. Guided by deep learning from mother nature, our algorithms effectively identify the golden balance between the multiple optimization parameters, which lead to the maximum protein expression rates without affecting the encrypted folding patterns.
Case study
In order to showcase the potential and value of CodonSP we decided to run a use-case study with a large antibody service provider (CRO). The goal was to utilize the codon optimization tool to increase the yield of 2 problematic antibodies with low expression yield. This figure shows the results of the case study, with encouraging results of over 700% yield increases in some cases. These antibodies were taken from under 20mg/L to over 100mg/L in all cases. The study utilized four different versions of CodonSP A.I models, each fine-tuned for different performance metrics, this approach allows us to develop the ultimate model overtime. Beyond yield increase we also validated that the monomer solubility of the two antibodies has been improved from 85% to 98% and from 91% to 94% for Ab #1 and Ab #2, respectively

What's next
Beyond internally utilizing our CodonSP models, we are now offering the tool as a service to other companies wishing to utilize the advanced codon optimization for their internal projects, or for their customers. You could be a therapeutics developer looking to enhance your yield for preclinical and clinical scale production, or a CRO looking to offer your customers a unique solution for expression-challenged proteins; regardless of your organization type and proteins you work with, we're happy to explore how we can offer CodonSP as a valuable addition to your projects.
Contact us at hello-codon@proteinea.com for more information on commercial partnerships