DNA Analysis
DNA analysis is a scientific technique used to study the genetic material contained in DNA, enabling the identification of its source—whether it be an individual, infectious agent, or another organism. The process involves observing the sequence or length of DNA fragments, often utilizing methods such as gel electrophoresis and polymerase chain reaction (PCR). DNA analysis has significant applications in both medical and forensic fields. Medically, it aids in the detection of genetic disorders and the identification of mutations. In forensics, it has revolutionized crime scene investigations by allowing for precise identification of individuals through DNA fingerprinting.
Historically, the advancements in DNA analysis began with the discovery of the DNA structure in 1953, leading to the development of various techniques over the following decades, including restriction fragment length polymorphism (RFLP) and PCR. These methodologies allow for the amplification and analysis of DNA samples, making it possible to perform tests with minimal material. However, DNA analysis is not without limitations, including challenges related to contamination and the sensitivity of the tests.
In recent years, the cost of DNA sequencing has significantly decreased, making it more accessible while also raising concerns about privacy and ethical issues surrounding genetic information. As technology continues to advance, DNA analysis is expected to play an even more critical role in both health diagnostics and forensic science, providing valuable insights while prompting important discussions about the ethical implications of such powerful tools.
DNA Analysis
Summary
DNA analysis involves the use of scientific tools to access the information found in DNA to identify its source, whether some infectious agent, another organism of interest, or a particular individual, such as in forensic applications. Medical applications of this technology include the search for mutations associated with genetic disorders and the design of probes that are able to diagnose these disorders in a timely fashion.
Definition and Basic Principles
DNA analysis is, in the strictest sense of the term, an actual observation of the length or sequence of a portion of DNA. The length of a fragment of DNA can be determined using gel electrophoresis. This technique involves placing DNA onto a semisolid support, or gel, and applying an electric current to the gel so that DNA migrates toward the positive pole. The migration of DNA in gel electrophoresis is proportional to its mass, which is, in turn, proportional to its length. Determining the actual sequence of bases in a strand of DNA is much more complicated.
![CBP chemist reads a DNA profile to determine the origin of a commodity. By James Tourtellotte, photo editor of CBP Today[1] [Public domain], via Wikimedia Commons 89250422-78408.jpg](https://imageserver.ebscohost.com/img/embimages/ers/sp/embedded/89250422-78408.jpg?ephost1=dGJyMNHX8kSepq84xNvgOLCmsE2epq5Srqa4SK6WxWXS)
Although DNA analysis has at times been equated with genetic testing, the two are not always the same. Certain types of genetic testing developed in the 1960s did not technically involve DNA analysis. Amniocentesis, which allowed for Down syndrome testing, actually involved chromosomal analysis following the creation of a karyotype (organized profile of a person's chromosomes). Similarly, genetic testing for phenylketonuria (PKU) and Tay-Sachs disease originally involved enzyme assays, not an analysis of the defective genes themselves. Thus, actual DNA analysis did not begin in earnest until the mid-1970s.
Background and History
Although the double helical structure of DNA was first described in 1953 by American molecular biologist James D. Watson and British biophysicist Francis Crick, more than twenty years passed before scientists developed methods of comparing DNA for the purpose of identification. In 1974, British molecular biologist Joseph Sambrook described the differentiation of human tumor viruses following cleavage by a restriction endonuclease (an enzyme that cleaves DNA at a specific nucleotide sequence). He noticed that different-sized bands of DNA were visible following their separation on a gel. This discovery formed the basis for what has become known as Restriction fragment length polymorphism (RFLP). Although the DNA of viruses and even bacteria could be analyzed directly by RFLP, the restriction enzyme cleavage patterns of higher organisms were of sufficient complexity that only a subset of the bands that were produced could be analyzed. DNA analysis of higher organisms was made possible in 1975 by the development of the Southern blotting technique by British biochemist Edwin Southern.
The following decade saw two ideas that would revolutionize the field of DNA analysis. In 1983, American biochemist Kary Mullis developed the polymerase chain reaction (PCR), a process that enabled the amplification across several orders of magnitude of small amounts of starting sample DNA in the laboratory. Then, in 1985, British geneticist Alec Jeffreys realized that human DNA was peppered with regions of repeating sequences, or variable number tandem repeats (VNTRs), and that comparisons of these regions could create a unique DNA fingerprint for any given individual.
How It Works
Probes and Primers. Most of the methods of DNA analysis take advantage of DNA's natural tendency to form a double helix. RFLP analysis has long been paired with Southern blotting. This procedure involves binding a small synthetic fragment of DNA to a region of interest that is contained within one or more of the bands of DNA that have been separated by gel electrophoresis and then blotted onto some type of membrane. This binding is made possible by the complementary nature of the DNA bases, the fact that adenine forms hydrogen bonds to pair with thymine and that cytosine pairs with guanine, a concept called Watson-Crick base pairing after the discoverers of DNA structure. The synthetic DNA, called a probe, is designed to contain about twenty complementary nucleotides to the sequence of interest. Binding of this probe to the blotted DNA is called hybridization. Originally, DNA probes were labeled with a radioactive marker to enable the detection of their position on a membrane, but later probe labeling has included nonradioactive alternatives such as fluorescent dyes.
The polymerase chain reaction (PCR) also involves the binding of small synthetic fragments of DNA to a region of interest, but in this process, two such fragments bind to opposite strands of the target DNA. These fragments, although identical in structure to the probes described earlier, are called primers because they are used to prime a DNA synthesis reaction. Also, the binding reaction, which involves the same process of complementary bases coming together to form hydrogen bonds, is referred to as annealing. It had been previously discovered that DNA could be made in the laboratory by taking a given single strand of DNA and adding a specific primer, the four types of nucleotides, and purified DNA polymerase (the enzyme normally involved in the polymerization process), but Mullis's insight in the early 1980s was that this process could be converted into a chain reaction that produced large amounts of DNA. By adding two primers instead of one and by using double-stranded DNA as a target, twice as many molecules of DNA could be created, but this necessitated a step in which the DNA had to be heated to near-boiling temperatures to separate, or melt, the two strands of the double helix. Mullis reasoned that a DNA polymerase that had been purified from a thermophilic microbe would be able to survive this heating step. This would allow the stringing together of a number of cycles with three different temperatures—one each for annealing, polymerization, and DNA melting—without having to add more DNA polymerase enzyme. Thus, after each cycle, the amount of DNA double helix would be doubled, resulting in more than a billion molecules of DNA after thirty cycles, even if the cycle started with only one strand of DNA.
DNA Polymorphism. DNA analysis takes advantage of the intrinsic variability that exists among organisms as well as among members of the same species. Polymorphism, a word derived from the Greek for “many forms,” is used to describe this variability, the simplest form of which is a single nucleotide polymorphism (SNP). Single nucleotide polymorphisms, which are also referred to as point mutations in genetics, are detectable by RFLP analysis only when they occur within the recognition sequence for a particular restriction enzyme because the enzyme fails to cleave the altered sequence. RFLP analysis also readily detects deletions or insertions of DNA sequences that have occurred between restriction enzyme cleavage sites. What Jeffreys realized in the 1980s was that most restriction fragment length variation in humans was not caused by large insertions or deletions of unique DNA sequences but by a variation in the number of repetitive DNA elements that were found in tandem with one another. He did not, however, use the PCR method that was being developed at the time because a practitioner of PCR must know the precise sequences that flank a site of interest to design the primers used in this procedure. Instead, Jeffreys performed Southern blotting using a probe designed to hybridize with the about fifteen-nucleotide-long sequence that he was studying. This probe specifically labeled the regions of the membrane that contained these variable number tandem repeats (VNTRs). For this contribution, Jeffreys has been called the father of DNA fingerprinting.
Subsequent analysis of various regions in human DNA has taken advantage of PCR to produce results, focusing on even smaller tandem repeats with repeating units that are only one to six nucleotides in length. Discovered in 1989, these short tandem repeats (STRs) were eventually found to outnumber variable number tandem repeats by nearly one hundredfold, being found at more than 100,000 sites in human DNA. As more and more of these STR sites were characterized over time, primers that annealed to their flanking sequences were designed to amplify the repeat area in question.
Applications and Products
Of Microbes and Man. Although the tools involved in DNA analysis are often used in basic research such as determining the evolutionary relationships between organisms, much of the application of this technology involves analysis for the purposes of identification. Although identification could potentially include any organism of interest, the primary focus of DNA analysis has been disease-causing viruses and microorganisms along with humans. Ever since Sambrook and colleagues first applied RFLP analysis to differentiate between two strains of viruses, viral epidemiology has remained an important application for tools such as PCR. For example, around the beginning of the twenty-first century, nucleic acid amplification testing (NAAT) was developed to detect the viral load of the human immunodeficiency virus (HIV). The procedure is a faster and more effective way to test for the presence of HIV in a person. NAAT has also been applied as a diagnostic test for certain bacterial infections. Other PCR-based methods have been adapted to test for bacterial contamination of foods as well as of hospital areas and supplies. In most cases, the identification of the precise strain of virus or microbe present is unnecessary because the physical presence of an infectious agent, not its detailed classification, is of interest. Tandem-repeat-based methods of identification are largely useless when analyzing such infectious agents because these agents tend to lack such repetitive DNA sequences. Because the DNA sequence of the entire genome (the complete set of DNA found in a particular organism) of most known infectious agents has been determined, it is possible to design primers that will specifically amplify DNA from a given target species.
In some cases, as in life-threatening illnesses, potential epidemics, and acts of bioterrorism, the speed at which an infectious agent is identified is critical to saving lives. For such applications, a type of PCR called real-time PCR has been developed. Rather than waiting to run gel electrophoresis after the full thirty or so cycles of a traditional PCR reaction have been completed, real-time PCR measures the production of a fluorescent-tagged product in real time, during the early phases of the reaction. This allows for an agent to be detected in minutes rather than hours.
Crime Scenes and Beyond. The best-known use of DNA analysis is probably in the area of forensics. The first case in which Jeffreys applied DNA fingerprinting was an immigration dispute. In 1983, British authorities had denied a thirteen-year-old boy entry into the country, claiming that his passport was forged and that his stated mother, a British subject, was not his biological mother. The dispute continued until 1985, when Jeffreys was able to apply his new technique to prove that the maternal relationship stated on the passport was indeed correct. Since that time, maternity tests have been vastly outnumbered by paternity tests, but the principle used in both types of parental testing remains the same.
The first use of DNA fingerprinting in a criminal case occurred in 1986, when it was used to exonerate a suspect accused of the rape and murder of a teenage girl near Leicester, England. Later, the same technique was used to identify the real killer. Since this early case, evidence from DNA fingerprinting has helped convict thousands of criminals. The source of DNA is blood in about half of all cases; other common sources are semen and hair. DNA analysis also plays an important role in the identification of human remains following disasters, acts of terrorism, and war.
Limitations of PCR. Following the advent of PCR, the amount of forensic sample required for analysis was reduced significantly. The original DNA fingerprinting procedure developed by Jeffreys required a blood sample about the size of a quarter, but later methods needed only a few cells swabbed from a person's cheeks to perform an analysis. Although PCR requires much less starting material than RFLP analysis and is also a more rapid procedure to perform, it does have a number of limitations. The first limitation, that flanking DNA sequences must be known ahead of time, was largely overcome as more and more human short tandem repeats were characterized along with the DNA that surrounded them. A second limitation is that the method is so sensitive that it is prone to contamination by outside sources. Because even a single fragment of DNA can be amplified into large amounts on a gel, care must be taken not to introduce foreign DNA from an investigator's hair or fingertips. A third limitation is that only a single area, or locus, of DNA can be analyzed at one time. To overcome this limitation, a procedure called multiplex PCR has been developed. This method simultaneously employs a number of primers that have been labeled with fluorescent tags. These can be identified during the subsequent gel electrophoresis step based on their specific labels.
Medical Applications. Besides using DNA analysis to identify infectious agents, the medical community has begun to use this technique to study genetic disorders. However, common methods of DNA analysis cannot identify most genetic disorders, with the exception of a class of disorders called trinucleotide repeat expansion disorders. This rare class of disorders, which includes Huntington's disease as well as fragile X syndrome, is readily detectable using PCR amplification of the short tandem repeats that contribute to the disorders in question. A more common class of genetic disorders results from point mutations in genes and can therefore be linked to particular single nucleotide polymorphisms in the human genome. Unfortunately, single nucleotide polymorphisms are not detectable by PCR and will show up in RFLP analysis only if they occur in the restriction enzyme recognition site itself, which is a rare occurrence. The identification of genetic disorders is therefore largely dependent on determining the actual sequence of the DNA, still a technically challenging and expensive undertaking despite progress that has been made since the inception of the Human Genome Project DNA sequencing program in the 1990s.
Methods involved in DNA sequencing include many of the same principles as other forms of DNA analysis. A single primer is labeled with a fluorescent dye and mixed with a target sequence in the presence of a thermostable DNA polymerase. This procedure does not amplify the DNA as in PCR but results in primer extension for a certain length along the target sequence. Another difference from PCR is that modified nucleotides are added to this mixture so that the primer extension is halted whenever these particular nucleotides are incorporated into a growing DNA strand. Four separate tubes are used in this method, one for each of the four DNA bases. Once these four reactions are separated by electrophoresis, the order of bases can be determined using computer software that monitors the relative migration of the bands that occurs from each of the four reaction tubes.
Careers and Course Work
Scientists in general have traditionally chosen from three career paths, industry, academics, and government, with industry providing the most jobs and government the fewest. This general principle is true for those interested in the science of DNA analysis. Industry leads the way in the design of diagnostic tests and DNA extraction kits, academics is the domain of basic research, and government dominates the field of forensic science. Most of the forensic analysis performed in the United States occurs at the governmental level. Courses include genomic data science, bioinformatics, human genetics, and genomic technologies.
Those who are interested in a career in forensic science should bear in mind a few facts. First, the hybrid job of police officer/detective/forensic scientist depicted on television dramas does not exist in reality. Forensic scientists either work in the field or in the laboratory but rarely in both. In large laboratories, the forensic analyst will tend to specialize in a particular area, and in small laboratories, the analyst will be more of a generalist but also will lack most of the resources and equipment shown on television. Second, although dozens of colleges have introduced degree programs in forensics to keep up with the increased demand, any bachelor's degree that gives its holder a solid background in science and mathematics and excellent communication skills is sufficient to work in this area. Third, students should realize that such a bachelor's degree prepares a person to begin working as a technician performing largely support functions such as preparing reagents for analysis. A master's degree or certification program is most likely required to specialize in a subfield of forensic science and to perform more of the scientific analysis of evidence, while a doctoral degree in an associated field is preferred for advancement to administrative positions such as laboratory director. Aspirants can work as DNA analysts, forensic analysts, lab instructors, or genetic counselors.
Social Context and Future Prospects
Single nucleotide polymorphisms (SNPs), although not used extensively in forensic applications, potentially contain valuable information that can be of use to crime scene investigators. For example, the presence of particular SNPs may indicate a perpetrator's race, while others could indicate hair color. One disadvantage of SNPs, besides the relative difficulty of identifying them, is that many more of them are needed to provide a unique identification (compared to the number of short tandem repeats needed for PCR). Because most SNPs are biallelic, they contain one base or another but generally not all four possible bases, and it is estimated that as many as fifty would have to be analyzed to obtain the same level of confidence as provided by the thirteen STR loci contained in CODIS. This may not prove as difficult as it sounds because it is estimated that there are probably about 10 million SNP sites scattered throughout the human genome. If accurate, that would mean that SNPs outnumber short tandem repeats to the same degree that short tandem repeats outnumber variable number tandem repeats.
DNA sequencing in some form or another is likely to continue to play an increasing role in DNA analysis. The cost of DNA sequencing is beginning to drop as it becomes more prevalent and increasingly automated. Although the first human genome sequence was produced at a cost of billions of dollars, scientists set a goal of reducing the cost of DNA sequencing to about one thousand dollars, a goal that was achieved by 2015; in subsequent years it fell to just a few hundred dollars. At the same time, scientists are developing a number of methods that allow SNPs to be determined without first finding the sequence of the 99.7 percent of DNA bases that do not exist as SNPs. These methods include directed hybridizations, ligations, primer extensions, or nuclease cleavages that specifically involve SNPs while leaving the rest of the DNA alone.
With any increase in the involvement of DNA sequencing in forensics comes the likelihood that debate will intensify concerning privacy issues regarding the use of sequence information. Unlike commonly used methods of PCR analysis, SNP determination will reveal certain details about suspects that could be open to abuse. Ethical issues involving the use and dissemination of DNA data will have to be resolved as the methods of DNA analysis continue to evolve.
In 2021, researchers at Purdue University in Indiana published a genetic analysis suggesting that the COVID-19 virus, or severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), does not have the machinery to integrate its genetic material into human DNA, as was feared based on some people's experience of testing positive for the virus long after recovering from COVID-19.
Bibliography
McClintock, J. Thomas. Forensic DNA Analysis: A Laboratory Manual. CRC Press, 2008.
Nakamura, Yusuke. “DNA Variations in Human and Medical Genetics: Twenty-Five Years of My Experience.” Journal of Human Genetics, vol. 54, 2009, pp. 1–8.
Pereira, Filipe, et al. “Identification of Species with DNA-Based Technology: Current Progress and Challenges.” Recent Patents on DNA and Gene Sequence, vol. 2, 2008, pp. 187–200.
Roper, Stephan M., and Owatha L. Tatum. “Forensic Aspects of DNA-Based Human Identity Testing.” Journal of Forensic Nursing, vol. 4, 2008, pp. 150–56.
Rudin, Norah, and Keith Inman. An Introduction to Forensic DNA Analysis. 2nd ed., CRC Press, 2002.
"The COVID-19 Virus May Not Insert Genetic Material into Human DNA, Research Shows." Purdue University, 18 May 2021, www.purdue.edu/newsroom/releases/2021/Q2/the-covid-19-virus-may-not-insert-genetic-material-into-human-dna,-research-shows.html. Accessed 19 June 2021.
Watson, James D., and Andrew Berry. DNA: The Secret of Life. Alfred A. Knopf, 2006.