Proteomics and Protein Engineering

Summary

Proteomics is the study of an organism's complete set of proteins. The term proteome is commonly used to describe all the proteins made by an organism's cells. It can also be described as all the proteins synthesized by a particular cell at a particular time. Protein engineering is the process of developing useful or valuable proteins for practical use. Protein engineering uses two strategies for engineering proteinsrational design and directed evolution. Both techniques have been developed to synthesize and manipulate proteins. To study proteins, one needs to determine the sequence of each protein's amino acids and its corresponding three-dimensional structure.

Definition and Basic Principles

Proteomics is the study of proteins—their structure and function and their interactions with other proteins. Proteins are made from the primary sequence of deoxyribonucleic acid (DNA), which is then transcribed into a messenger RNA (mRNA) molecule, which, in turn, is translated into polypeptide chains that will form a three-dimensional functional protein product. The study of proteins is more complicated than the study of genomics because of the complexity of modifications that occur for the DNA sequence to become a protein product and because the proteome changes from cell to cell and at different periods in a cell or organism's life.

89250563-78495.jpg

Once the primary sequence of DNA forms an mRNA molecule, it is then translated into a polypeptide chain, in which posttranslational modifications (chemical modifications after translation) take place, which change the functional nature of that protein product. Specific posttranslational modifications include such processes as phosphorylation, ubiquitination, methylation, acetylation, glycosylation, oxidation, and nitrosylation. Although these processes may not be familiar to most people, they are processes capable of modifying the organism's DNA to create various protein products. Once these changes in the protein take place, each cell may or may not express that protein because of variables like the time in the organism's life, whether a functioning protein is needed in the type of cell in which the protein resides, or whether the specific conditions of the cell are conducive for a functioning protein. Also, proteins may have to communicate or interact with other proteins to become functional.

With all these possibilities to express certain proteins, the study of proteomics and the proteome is extensive and complex. However, understanding proteins and their functions will make it possible to design new drugs for treating diseases and provide researchers with better knowledge of how cells work, how they interact with other cells, and how these interactions relate to a cell's ability to survive.

Background and History

The term proteome was coined by Marc Wilkins in 1994 as part of his doctoral thesis work on two-dimensional gel electrophoresis on proteins. Gel electrophoresis is a procedure that allows researchers to visualize the presence or absence of a protein in a specialized medium. Wilkins used this term to describe the entire complement of proteins expressed by the genome, which can be described as all the proteins in a cell or an entire organism. The term proteomics was coined in 1997 to describe the study of proteins in the genome, combining the terms protein and genomics. When the Human Genome Project was completed in 2003, the entire human genome had been mapped. With knowledge of the human DNA sequence, the next logical step was to turn to the set of proteins in a cell or organism.

As the discipline of proteomics began, proteins were first studied by analyzing messenger RNA. Messenger RNA (mRNA) is a particular form of RNA that transcribes the genetic instructions of DNA to proteins for later expression. In early proteomic research, scientists took DNA, converted it into mRNA through transcription, and then studied the resulting mRNA to look for its protein product. It was found, however, that mRNA does not always correlate directly to protein content. First, mRNA may not always translate into a functional protein. Second, the amount of protein produced from any mRNA differs based on the gene from which it is transcribed and how much protein that cell may need at any one time. Third, the protein product may differ extensively from the original mRNA message based on posttranslational modifications. Fourth, proteins may need to interact with other proteins to be functional. All these complex features in studying the proteome have made new methods of study necessary, and new approaches to studying proteins have become available.

How It Works

With the advent of new molecular and cell biology technologies, which began with the Human Genome Project in the 1990s, the study of proteins in the early 2000s advanced significantly. Different technologies were developed, and the data generated from these studies grew so large that bioinformatic systems were developed to store all the data for interpretation and sharing with the scientific community. Protein studies have numerous steps and various technologies for isolating and analyzing the proteins.

Isolation of Proteins. The first approach to studying proteins is to disrupt cells and separate the proteins from the other particles in the cell. Cells are first treated with a detergent or urea so that enzymes will not degrade the proteins. These steps can be accomplished using kits that will perform a whole-cell extraction, producing small pieces from which one can isolate the proteins of interest, for example, isolating the hydrophobic proteins involved as membrane proteins.

Separation and Purification. Once the proteins are isolated, one can begin to separate and purify them. A common method is to use affinity or liquid chromatography to separate specific proteins or a family of related proteins by binding proteins to specific substrates, which elute out specific proteins from a mixture. Certain technologies have made it possible to extract very small quantities of proteins from a mixture.

Proteins can also be labeled with an isotope to measure the quantity of protein separated using mass spectrometry. Tags can also be peptide-derived rather than isotopes. Peptide-tagged proteins can be isolated and purified directly when the proteins to be studied are already known. One can then gather information on the abundance of proteins in cells.

Two-Dimensional Gel Electrophoresis. Two-dimensional gel electrophoresis was developed in the 1970s and has enhanced the ability to separate specific proteins. This method puts a mixture of proteins through two gels or dimensions. The first dimension separates proteins by isoelectric identification using a pH (acidity-alkalinity) gradient, and the second dimension is SDS-PAGE (sodium dodecyl sulfate-polyacrylamide gel electrophoresis), which separates proteins by their molecular weight (larger molecules move slowly through a gel and smaller proteins move quickly and farther along the gel). Many companies sell kits to perform two-dimensional gel electrophoresis, making these steps easier than in the past.

Protein Identification and Mass Spectrometry. Once the gels have been run, interpreting the data may be challenging. The data appear as spots for each of the protein products that have been separated. To identify these spots, certain dyes—fluorescent dyes or blue or silver stains—are used to stain specific proteins.

Analyzing different proteins is generally accomplished with mass spectrometry. Proteins are usually ionized by different processes. One can use computer software to analyze the data to compare each of the proteins expressed.

Microarrays. Protein microarrays take a mixture of proteins and add them to antibodies, antigens, enzymes, substrates, membrane receptors, or ligands, which can recognize the proteins with which they associate or interact. These specific proteins will light up on an array, or chip, that identifies specific proteins from the thousands of proteins in the mixture. This technique, which is being used but is still in development, is a powerful tool and has the potential to eliminate numerous laborious steps being used to identify proteins.

Applications and Products

Proteomics has immense potential for a broad number of practical applications in medicine and the pharmaceutical industry.

Basic Research. One major area of proteomic research is to understand how the amino acid primary sequence specifies the stability and dynamics of protein conformation. This research would provide information on how to design novel functionalities of proteins and on disorders that work by changing a protein's three-dimensional structure, such as amyloidosis and prion diseases. Studying the folding and unfolding of proteins will help scientists understand the three-dimensional structures of proteins.

Disease Detection. One major application of proteomics is for disease detection. The National Cancer Institute is developing ways that proteomic technologies can be used to detect proteins seen in disease. This technology uses biomarkers and target proteins to detect known diseases, including cancers, autoimmune disorders, and inflammatory diseases, and screen for allergies.

Biomarkers that can be used for early disease detection include gene mutations, gene transcription and translation modifications, and alterations in protein products. By looking at free DNA in serum, clinical testing has developed serum screens, and with the addition of biomarkers, testing has been expanded to use oncogene mutations, microsatellite instability regions in the DNA, and hypermethylation of promoter regions in DNA for the detection of cancer.

Clinical testing techniques using proteins as the basis for disease detection include western blotting, immunohistochemical staining, enzyme-linked immunosorbent assay (ELISA), and mass spectrometry.

One such approach to disease detection is using proteomic pattern diagnostics to detect cancer. The first report of using this technique coupled with mass spectrometry was to identify ovarian cancer. More than two-thirds of ovarian cancer is not detected until it is in an advanced disease stage. Early detection is essential if this disease is to be treated and cured successfully.

One detection method is to identify discriminatory patterns of proteins that indicate ovarian cancer. Serum (obtained through a noninvasive procedure) from both normal women and those with ovarian cancer shows distinct patterns of proteins. This approach can be used for other cancers and diseases once the protein patterns are established.

Disease Treatment Though Novel Drugs. The development of novel drugs using proteomics for disease therapy holds tremendous potential. The major step forward is to identify proteins associated with disease, which can then be used as targets for drug development. First, a protein that is implicated in a disease must be identified. Then, drugs can be designed based on the protein's three-dimensional structure to interfere with the action of the protein. This can be achieved by developing molecules that fit into the protein at the active site to stop the enzymatic reaction that would normally occur. Inactivation of enzyme function may also be a means of developing personalized drugs based on different individuals' genetic, and thereby, protein makeup. With the advent of computer databases of protein structures, computer techniques can fit different molecules into a three-dimensional structure called virtual ligand screening.

In the early 2020s, scientists used this method to develop a vaccine to combat the COVID-19 pandemic. Scientists in China mapped the virus’s genetic sequence and shared that information with researchers. Two biotech companies, Moderna and Pfizer-BioNTech, used the information to make an mRNA vaccine that proved highly effective in combatting COVID-19. The vaccine used messenger RNA to instruct the body’s cells to recreate a protein spike from the virus. The immune system reacted to the protein spike and began creating antibodies to fight the virus. The mRNA vaccines produced an immune response without introducing any part of the virus into the body. In March 2024, scientists at New York University bioengineered a protein that targets SARS-CoV-2, the virus which causes COVID-19. The protein has potential applications in preventing and neutralizing an array of viruses.

For cancer diseases alone, drug development is a major area of scientific endeavor, and the industry continues to grow. Protein engineering has been used both to develop more effective variants of existing treatments, such as for modifying antibiotics to circumvent antibiotic resistance and to develop completely new treatments for conditions that may be rare or difficult to treat. For example, Top7, a fusion protein, was used to create an interleukin 1 blocker, Arcalyst (rilonacept), which was approved by the U.S. Food and Drug Administration (FDA) in 2008 for the treatment of cryopyrin-associated periodic syndromes (CAPS), a family of syndromes involving inflammation affecting the joints, skin, and eyes. Rilonacept is one of the few medications approved by the FDA to treat CAPS. The use of engineered proteins to fight Ebola was also investigated.

Protein Engineering. The field of protein engineering involves developing and creating useful or important proteins. Protein engineering encompasses two main strategiesrational design and directed evolution.

In rational design, mutations are induced to make changes to a protein to change the structure or function of that protein. Because mutagenesis techniques are well established, this technique has great potential. However, the difficulty in its success is that a protein's structure and function may not be known in detail. For poorly defined proteins, it may be impossible to determine what mutations to incorporate.

Random mutations may be induced in a protein, and the variant proteins with the desired qualities are sorted from the protein mix. Once the variant proteins have been isolated, further mutations and selection of those variants are performed. This process of mutations and selection is called directed evolution because it simulates natural selection and may produce more fit or successful proteins. Another technique used in directed evolution is DNA shuffling, which mixes and matches pieces of variants to produce better protein products. This process is similar to the recombination process that occurs naturally in individual cells during sexual reproduction. One advantage of directed evolution is that it does not require prior knowledge of a protein or knowledge of which mutation to induce. Rather, mutations are randomly induced, and those mutations are monitored to see their effect on the protein expression in a cell. The difficulty of directed evolution is that it requires high throughput, in which large amounts of DNA must be mutated and the protein products monitored for the desired qualities. This process may not be feasible for all proteins.

Careers and Course Work

The field of proteomics is growing rapidly and offers exciting new careers in research for individuals with bachelor's through doctoral degrees. Careers in the field can be as diverse as medical doctors, doctorate-level researchers, and laboratory technicians. The number of laboratories that conduct protein research and therapeutics is growing rapidly, and funding for this type of research is expanding.

To enter the field of proteomics research, a person must first acquire a bachelor of science degree and conduct basic laboratory work. A master's degree in biology or a field of biology, such as genetics, microbiology, or molecular biology, may help in the competitive labor market. Research and laboratory technicians do much of the bench work. A doctorate is required to reach a position with decision-making power, such as becoming the director of a laboratory. Medical doctors often participate in research and are involved in applying research to patients in a clinical setting. The job titles often seen in this field include medical doctors, principal investigators in research, laboratory directors, medical technologists, research technologists, laboratory technicians, and college professors involved in research.

Social Context and Future Prospects

With the coupling of advanced instrumentation and new technologies and informatics tools, the field of proteomics has great potential in disease detection, drug development, and basic research on proteins.

Another major aspect of the future of proteomics is whether a patent system that facilitates research and can effectively change the dynamics of the field of genomics and proteomics can be developed. A definite conflict of interest exists between for-profit industries and nonprofit research institutes. This makes research studies difficult to finish and publish so that other researchers can freely access the results. This dilemma is worldwide and leads to higher costs of conducting research and difficulties in converting research results to clinically useful applications. Intellectual property laws and patenting are difficult issues in genomics and proteomics.

The issue of how to use systems biology, including proteomic research, to improve the health of individuals is a major priority. It is increasingly apparent that proteomics will have a major role in creating a predictive, preventative, and personalized approach to medicine. This raises the question of how individualized medicine will be handled by insurance companies and whether there will be a disparity between individuals in their access to individualized medical procedures and therapies. The idea that genomic and proteomic research may outpace societal change must be recognized, and steps must be taken to address all issues.

As technology advances, scientists gain new, promising approaches for engineering proteins. Using machine learning artificial intelligence software, researchers at the University of Washington Medical Center accurately predicted the behaviors of specific protein mutations rather than using traditional trial and error. Instead, algorithms create statistical models to analyze new data and identify sequences likely to result in a protein with a desired outcome. This computational approach saves time and money.

Bibliography

Alberghina, Lilia. Protein Engineering in Industrial Biotechnology. CRC Press, 2014.

Awasthi, Sudheer. Introduction to Proteomics. Arcler Press, 2022.

"Biochemists and Biophysicists." US Bureau of Labor Statistics, 17 Apr. 2024, www.bls.gov/ooh/life-physical-and-social-science/biochemists-and-biophysicists.htm. Accessed 20 May 2024.

Capelo-Martínez, José-Luis. Emerging Sample Treatments in Proteomics. Springer, 2019.

Flam, Faye. "Protein Engineering May Be the Future of Science." Bloomberg, 27 Mar. 2018, www.bloomberg.com/view/articles/2018-03-27/protein-engineering-may-be-the-future-of-science. Accessed 30 Aug. 2018.

Mishra, Nawin C., and Günter Blobel. Introduction to Proteomics: Principles and Applications. John Wiley & Sons, 2014.

Radziwon, Katarzyna, and Amy M. Weeks. “Protein Engineering for Selective Proteomics.” Current Opinion in Chemical Biology, vol. 60, 2021, pp. 10-19. doi:10.1016/j.cbpa.2020.07.003.

Sharma, Divakar. Microbial Proteomics: Development in Technologies and Applications. Bentham Science Publishers, 2020.

Srivastava, Sanjeeva. From Proteins to Proteomics: Basic Concepts, Techniques, and Applications. Taylor & Francis Group, 2023.

"Understanding How COVID-19 Vaccines Work." Centers for Disease Control and Prevention, 22 Sept. 2023, www.cdc.gov/coronavirus/2019-ncov/vaccines/different-vaccines/how-they-work.html. Accessed 20 May 2024.

Zhao, Huimin, et al. Protein Engineering: Tools and Applications. Wiley-Blackwell, 2021.