DNA profiling
DNA profiling, also known as DNA testing or genetic fingerprinting, is a forensic technique used to identify individuals based on their unique genetic makeup. By analyzing specific regions of DNA, particularly short tandem repeat (STR) markers, forensic scientists can establish connections between DNA evidence and potential suspects in criminal cases, as well as identify victims in mass disasters. The process hinges on the understanding that while 99% of human DNA is identical among individuals, the remaining 1% contains variations that can be critical for identification.
To accurately interpret DNA profiles, statistical analyses are employed, allowing for the exclusion of innocent individuals from suspect lists when a mismatch occurs in DNA markers. In the United States, crime labs commonly utilize the thirteen STR markers outlined by the Combined DNA Index System (CODIS) for comparisons. The analysis incorporates the frequency of specific alleles in different populations, which are derived from extensive databases to ensure a fair representation of genetic diversity.
For cases involving biological relationships, such as paternity testing, analysts calculate the likelihood of a match by considering the baseline probability of relatedness. However, DNA profiling can be challenging; incomplete or degraded samples may complicate interpretations, necessitating careful analysis to avoid errors. Overall, DNA profiling has become a pivotal tool in forensic science, enhancing the accuracy and reliability of criminal investigations and identity verification.
On this Page
Subject Terms
DNA profiling
DEFINITION: Process of statistically analyzing the output of DNA typing results to determine the probability that another unrelated individual in the general population might share the same DNA fingerprint as the one obtained in the evidence sample.
SIGNIFICANCE: When DNA is analyzed and typed, a graph (called an electropherogram) containing peaks representing the different alleles (different forms of a gene) inherent to that particular sample is obtained. These data must be statistically interpreted by a forensic scientist, who calculates the frequency of that “fingerprint” combination in the general population, before they are presented and accepted in a court of law. Without statistical value, DNA processing serves no practical purpose in criminal investigations.
In legal cases involving (deoxyribonucleic acid) evidence, statistical analysis provides investigators with a tool that can potentially exclude innocent individuals from suspect lists. In forensic science, DNA samples are used not only to establish similarities between evidence and suspects or victims but also to identify victims of mass murders and catastrophes and to determine the parentage of children. The statistical analyses employed vary depending on the situation.
![Gel electrophoresis, in this case used to separate different dyes. This technique is used in DNA profiling. By David Shand [CC-BY-SA-2.0 (http://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons 89312123-73872.jpg](https://imageserver.ebscohost.com/img/embimages/ers/sp/embedded/89312123-73872.jpg?ephost1=dGJyMNHX8kSepq84xNvgOLCmsE2epq5Srqa4SK6WxWXS)
Because 99 percent of the bases that form the human genome are identical among all individuals, analysts need to use several different DNA markers to encounter differences in the remaining 1 percent. Most in the United States use the thirteen short tandem repeat (STR) markers used by the national DNA database, the Combined DNA Index System (CODIS), to compare DNA samples. Of these thirteen markers, only one needs to differ to exclude a particular individual (except identical twins) from being the source of an evidence sample. In that case, no statistical analysis is necessary. The inclusion of individuals, however, is somewhat more complicated.
Allele frequencies for each STR marker (the number of occurrences of a particular allele) have been determined by scientists working for the Federal Bureau of Investigation (FBI), a project funded by the National Science Foundation (NSF), and the National Institute of Science and Technology for various geographic regions and ethnic populations. These databases were created through the collection of DNA results from many (at least two hundred) individuals and the calculation of a percentage frequency of allele X in the total population surveyed for a specific marker. Individual crime laboratories determine which of the databases they use for frequency calculations, but the end results are similar in that they have either a very low or very high probability.
Statistical Approach
Each human being is made up of thousands of genes, each of which is inherited from the parents. Most individuals possess two copies (alleles) of a gene, one donated by the mother and one donated by the father. These copies can have either the same form (for example, two alleles for black hair) or different forms (one allele for black hair, one allele for blond hair). If the occurrence of inheriting one marker has no effect on the occurrence of the other, statisticians and analysts can multiply the frequencies of the individual alleles to establish the overall frequency of the DNA profile.
In criminal investigations, the races or geographic origins of the perpetrators are often not known; thus, a forensic scientist cannot place a suspect into an ethnic category (with any degree of certainty high enough to stand up in court) when determining which allele frequencies to use. Therefore, the most conservative approach is commonly used. For example, if given an allelic frequency database that has African American, Caucasian, and Asian populations, and the frequency of allele A of marker B is to be determined, the analyst will probably select the population in which allele A is more common or has the highest percentage. In doing this for each allele in question, the analyst will obtain a final number that is the highest possible for that specific profile. This approach ultimately favors the suspect and removes any biases that might compromise the investigation. Once all the allelic frequencies corresponding to the sample are obtained, the Hardy-Weinberg probabilistic principle is used to calculate the occurrence of that DNA fingerprint.
The Hardy-Weinberg principle is based on the assumption that the probability of two independent events occurring at the same time is the product of the probability of them occurring separately. The equilibrium formula for the Hardy-Weinberg principle is p2 + 2pq + q2, in which p and q represent the two possible forms of a particular gene. Thus, when an individual is homozygous (has two identical copies of the allele) for a particular allele, the frequency of that allele is squared (p2 or q2) to obtain the overall frequency for that marker in that particular individual. If the individual is heterozygous (has two different copies of the allele), then the frequency of the first allele is multiplied by the frequency of the second allele and the product is multiplied by two (2pq). Once all marker frequencies are obtained, the individual frequencies are multiplied to obtain the rarity of the overall DNA fingerprint.
Interpretation
When attempting to establish biological relationships, such as paternity, analysts do not attempt to find the probabilistic nature of randomly finding other individuals in the general population with matching profiles. Instead, they take into account the increased probability of similarity between samples (because there is a chance that they are related) to prevent under- or overestimation of the likelihood ratio. Take, for example, a paternity dispute in which the DNA of a child’s alleged father is analyzed. It is known that the child has a father, and because there is only one alleged father, a fifty-fifty chance exists that he is the father. When the calculation is done, a paternity likelihood probability will be obtained, and the known 50 percent default probability will increase this likelihood by a certain factor, thus making the results stronger.
Forensic scientists performing DNA analysis must be cautious, however, when profiles are incomplete or the DNA is degraded, given that markers that are actually heterozygous could be interpreted as homozygous because only one allele was amplified from the damaged DNA. Additionally, scientists need to perform more thorough and complicated analyses when they are dealing with mixed profiles or when there is suspicion of contamination of the DNA evidence.
Bibliography
Barbaro, Anna, Patrizia Cormaci, and Aldo Barbaro. “DNA Analysis from Mixed Biological Materials.” Forensic Science International, vol. 146, supp. 1, 2004, pp. S123–S125.
Buckleton, John, Jo-Anne Bright, and Duncan Taylor, editors. Forensic DNA Evidence Interpretation. 2nd ed., CRC Press, 2016.
Butler, John M., et al. “Allele Frequencies for Fifteen Autosomal STR Loci on U.S. Caucasian, African American, and Hispanic Populations.” Journal of Forensic Sciences, vol. 48, 2003, pp. 908–911.
Fung, Wing K. “User-Friendly Programs for Easy Calculations in Paternity Testing and Kinship Determinations.” Forensic Science International, vol. 136, 2003, pp. 22–34.
Lucy, David. Introduction to Statistics for Forensic Scientists. John Wiley & Sons, 2005.
Quaglia, Sofia. "The Evolution of DNA Forensics and Its Impact on Solving Crimes." Discover, 26 Oct. 2023, www.discovermagazine.com/the-sciences/the-evolution-of-dna-forensics-and-its-impact-on-solving-crimes. Accessed 6 Feb. 2025.