Home R & D Nano Revolutionizing Biotechnology: Indian Researcher and Team Develop ProGen Language Model for Protein...

Revolutionizing Biotechnology: Indian Researcher and Team Develop ProGen Language Model for Protein Design

Revolutionizing Biotechnology: Indian Researcher and Team Develop ProGen Language Model for Protein Design

A new study published by Madani and Krause describes ProGen, a deep-learning-based language model that generates functional artificial proteins from diverse protein families on demand. The study experimentally showed that ProGen-generated artificial antibacterial proteins are just as effective as natural proteins in killing bacteria while being unseen in nature.

ProGen is trained using a large, universal protein sequence dataset of 280 million naturally evolved proteins from thousands of families, of which five diverse lysozyme families were experimentally characterized in this study.

Bullet Point Summary:

  • ProGen, a deep-learning-based language model, generates functional artificial proteins from diverse protein families on demand
  • ProGen-generated artificial proteins are as effective as natural proteins in killing bacteria
  • Potential for using AI language models in protein design and engineering for solving problems in biology, medicine, and the environment

The study also shows that ProGen-generated artificial proteins are structurally well-folded for proper expression, which is a significant advantage over natural proteins. Additionally, ProGen has learned a flexible protein sequence representation that can be applied to diverse families such as chorismate mutase (CM)7 and malate dehydrogenases (MDH).

The authors of the study state that the use of deep-learning language models for precise de novo design of proteins has the potential for solving problems in biology, medicine, and the environment. However, they also note that ethical implications must be considered when using AI language models for protein design and engineering.

Overall, this study highlights the potential for AI language models in protein design and engineering and how it could be used to solve problems in various fields.

Outcomes:

  • ProGen, a state-of-the-art transformer-based conditional language model, generates protein sequences with predictable functions across protein families.
  • ProGen-generated artificial proteins are structurally well-folded for proper expression and can be applied to diverse families.
  • The use of deep-learning language models for precise de novo design of proteins has the potential for solving problems in biology, medicine, and the environment.

Research Paper- Large language models generate functional protein sequences across diverse families (Nature Biotechnology)

Ali Madani, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos Jr., Caiming Xiong, Zachary Z. Sun, Richard Socher, James S. Fraser & Nikhil Naik

What this paper is about

  • An important scientific method for designing proteins is.
  • The goal is to circumvent the need for evolution and design proteins with desired properties from scratch.
  • Advantage of a key trend in biology: the exponential growth in publicly-available raw protein sequence data, enabled by the dramatic reduction in sequencing costs.

What you can learn

  • In this work, the author experimentally showed that ProGen-generated artificial antibacterial proteins are just as effective as natural proteins in killing bacteria while being unseen.
  • These experiments demonstrate that ProGen can generate functional artificial proteins from diverse protein families on demand.
  • Achieving these goals, with careful consideration of ethical implications, will allow quick develop treatments for diseases or enzymes for industrial and environmental applications.

Q: What is ProGen?

A: ProGen is a state-of-the-art transformer-based conditional language model that generates protein sequences with a predictable function across protein families.

Q: What dataset is ProGen trained on?

A: ProGen is trained using a large, universal protein sequence dataset of 280 million naturally evolved proteins from thousands of families.

Q: What are some of the advantages of using ProGen to generate artificial proteins?

A: ProGen can generate artificial proteins that are structurally well folded for proper expression as compared to a batch of natural proteins, even when sequence alignment size and quality limit the success of alternative approaches. In addition, ProGen has learned a flexible protein sequence representation that can be applied to diverse families, such as chorismate mutase (CM)7 and malate dehydrogenases (MDH).

Q: What is the goal of the research described in the paper?

A: The goal of the research is to circumvent the need for evolution and design proteins with desired properties from scratch by taking advantage of the exponential growth in publicly-available raw protein sequence data.

Q: What have the researchers demonstrated in their experiments?

A: The researchers have experimentally shown that ProGen-generated artificial antibacterial proteins are just as effective as natural proteins in killing bacteria while being unseen.

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here