Single Nucleotide Polymorphism
Single Nucleotide Polymorphisms (SNPs) are variations in a single nucleotide at a specific position within the genome, representing a key type of genetic polymorphism. They occur frequently throughout the human genome, with an estimated frequency of about one SNP every 300 base pairs, and account for approximately 90% of genetic variation in humans. SNPs can be found in various genomic regions, including coding regions (exons), non-coding regions (introns), and regulatory regions (promoters). The identification and analysis of SNPs play a significant role in understanding human diseases, such as cancer, heart disease, and diabetes, by enabling researchers to compare genetic sequences between diseased and healthy individuals.
The Human Genome Project was instrumental in mapping SNPs, providing a foundation for subsequent studies that sought to correlate these genetic variations with specific health conditions. High-throughput sequencing technologies have advanced SNP research, allowing for large-scale investigations and the creation of public databases like dbSNP, which collects SNP information across various species. As research continues, the focus is on ensuring the accuracy of SNP data and understanding their density variation in different genomic contexts, particularly in relation to disease susceptibility. Overall, SNPs represent a crucial aspect of genetic research, offering insights into the complexities of human genetics and health.
On this Page
Single Nucleotide Polymorphism
Single nucleotide polymorphisms (SNPs) are alternative bases that occur at a single position within a genomic DNA sequence that may also be considered as alleles, or a variation at a specific locus or position in relation to a particular gene or genetic marker. Biotechnology has significantly influenced modern biology, particularly in terms of studies in the areas of molecular biology and genetics. Massive genetic data at various levels and resolution have been generated in the past few decades such as DNA sequences, genotypic information, haplotypes, and expression levels at the mRNA and protein levels. One of the most important applications of genetic data is in identifying polymorphisms or changes in nucleotide bases in relation to the development of various human diseases such as cancer, heart disease, diabetes, and neurological disorders. One of the most significant types of genetic data include SNPs, which generally occur at every 600 bp within the human genome.
![Four close-ups of mutated laminA LMNA (based on 1IFR) mutation R527L draft figure for PMID 2254940. By Lukasz Kozlowski (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons 87323260-106627.jpg](https://imageserver.ebscohost.com/img/embimages/ers/sp/embedded/87323260-106627.jpg?ephost1=dGJyMNHX8kSepq84xNvgOLCmsE2epq5Srqa4SK6WxWXS)
Background
Initial studies on SNPs were performed by the Human Genome Project, which was an internationally funded initiative to sequence the entire human genome and to identify all genes in relation to specific diseases. In addition to these goals, the Human Genome Project also aimed to detect all types of polymorphisms that possibly contribute to various diseases. These polymorphisms are identified by comparing DNA sequences between healthy and diseased individuals. SNPs are the most common type of polymorphism, representing around 90 percent of DNA variation in humans.
An SNP was first described as a diallelic marker, thus involving only two types of alleles. For example, the nucleotides adenine (A) and guanine (G) may occur at a particular location, which in turn may result in three genotypes, namely, AA, AG, and GG. However, DNA occurs as a double-stranded molecule; therefore, its complementary strand would often indicate that all four possible nucleotides, namely, thymine (T) and cytosine (C) could occur at that specific location in certain individuals. The general frequency of a single base difference in a particular location in the genome of two chromosomes therefore represents nucleotide diversity, which is approximately 1/1,000 base pairs (bp). The frequency of an SNP in an entire population is approximately 1/300 bp.
SNPs have been calculated to occur at intervals of 1,000 bp across the entire human genome. However, there are certain regions within the human genome that may vary; SNPs may therefore occur within the range of once to 100-fold higher in certain regions of the genome. In addition, SNPs are also present in various regions of the human genome, including coding regions (exons), non-coding regions (introns), and upstream regions of genes (promoters). In general, SNPs are often detected within introns, which are regions that do not encode any protein product. Finally, the most common SNP involves nucleotides C and T. The years following the completion of the Human Genome Project has witnessed attempts to establish patterns of occurrence of SNPs in specific diseases by using case-control studies.
Impact
The ideal situation for utilizing SNPs is to compare the sequences of large populations of healthy control individuals with those of a particular disease. The advent of high-throughput sequencing technologies has facilitated in these research investigations. These efforts have resulted in reports that describe the occurrence of around 1.4 million SNPs within the human genome, which are distributed in particular patterns across specific regions of the genome. A common method in performing comparative studies involve case-control research investigations and assessing the occurrence of specific SNP patterns. The best scenario for case-control studies is to include a large number of study participants and control subjects, thereby generating results with stronger statistical power. In addition, the data gathered from these case-control studies are deposited into a public database for SNPs such as the dbSNP, which houses a total of 8.3 SNPs derived from the human genome, at a density of around 28 SNPs for every 10 kb of genomic material. The dbSNP database also houses SNP information for other species, such as the mouse and chicken, and major parasites and pathogens. To further support studies, the National Center for Biotechnology Information (NCBI) of the National Institutes of Health in Bethesda, Maryland, has led efforts in cross-annotating various resources, including PubMed and GenBank to the dbSNP.
The information available in various SNP databases serves as a valuable resource for genomic investigations. However, SNP information is also largely influenced by the quality and the coverage of the sequences in the genome. Because there may be hundreds of institutions that submit SNP information to these databases, it is highly likely that certain entries are also low quality. It thus appears that the most important issue that has to be resolved with regards to SNPs is to determine whether a detected single-base change is real. There are also growing issues on the density of SNPs in genic (protein-coding genomic regions) and non-genic (non-protein-coding genomic regions). Researchers have suggested that the density of SNPs is apparently higher in non-genic regions, possibly because these are less likely to result in amino acid changes. On the other hand, SNPs occurring within genic regions may be strategically positioned at places that are less highly conserved, so that modifications in a single nucleotide may not be as deleterious to the final protein product.
Another approach in determining whether a specific SNP is real is to screen several unrelated individuals and find out if that particular SNP also exists in these subjects. For example, a research group may lead a sequencing project for a specific SNP in two hundred healthy controls. If the SNP of interest is detected is 1 percent of the study participants, then that particular SNP may then be considered as real. To date, several population studies have been conducted to screen larger groups of individuals for particular SNPs. These include the 1,000 Genomes Project, which screened a thousand individuals to detect all the SNPs in the entire human genome. Other SNP projects include the Exome Sequencing Project (ESP) and the Exome Aggregation Consortium (ExAC).
Bibliography
Al Khaldi, Rasha, et al. "Associations of TERC Single Nucleotide Polymorphisms with Human Leukocyte Telomere Length and the Risk of Type 2 Diabetes Mellitus." Plos ONE 10.12 (2015): 1–14. Academic Search Complete. Web. 12 Jan. 2016.
Bai, B., et al. "DoGSD: The Dog and Wolf Genome SNP Database." Nucleic Acids Research 43(Database issue): D777–783. Print.
Haraksingh Haraksingh, Rajini R., and Michael P. Snyder. "Impacts of Variation in the Human Genome on Gene Regulation." Journal of Molecular Biology 425.21 (2013): 3970–3977. Academic Search Complete. Web. 12 Jan. 2016.
Miller, R. D., and P. Y. Kwok. "The Birth and Death of Human Single-nucleotide Polymorphisms: New Experimental Evidence and Implications for Human History and Medicine." Human Molecular Genetics 10.20 (2001): 2195–2198. Print.
Mueller, Sabine C., et al. "BALL-SNP: Combining Genetic and Structural Information to Identify Candidate Non-Synonymous Single Nucleotide Polymorphisms." Genome Medicine 7.1 (2015): 1–8. Academic Search Complete. Web. 12 Jan. 2016.
Sherry, S. T., M. Ward, and K. Sirotkin. "dbSNP-database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation." Genome Research 9.8 (1999): 677–679. Print.
"What Are Single Nucleotide Polymorphisms (SNPs)?" MedlinePlus, 22 Mar. 2022, medlineplus.gov/genetics/understanding/genomicresearch/snp/. Accessed 21 Nov. 2024.
Yu Gyoung, Tak, and Peggy J. Farnham. "Making Sense Of GWAS: Using Epigenomics and Genome Engineering to Understand the Functional Relevance of SNPs in Non-Coding Regions of the Human Genome." Epigenetics & Chromatin 9 (2015): 1–18. Academic Search Complete. Web. 12 Jan. 2016.
Zhou, D, et al. "Polymorphisms Involving Gain or Loss of CpG Sites are Significantly Enriched in Trait-associated SNPs." Oncotarget 6.37 (2015): 39995–40004. Print.