Repetitive DNA

SIGNIFICANCE: Eukaryotic nuclei contain repetitive DNA elements of different origin (polymorphisms) that constitute between 20 and 90 percent of the genome, depending on the species. Repeated DNA acounted for about 50 percent of the human genome. These repetitive DNA sequences are typically moved through DNA by specific DNA interactions (transposons via transposition) or using RNA intermediates (retrotransposons via retrotransposition). The human genome is composed of about 42 percent retrotransposons and 2 to 3 percent transposons. The presence and type of repetitive DNA elements have provided insights into evolution, gene flow, gene mapping, forensic investigation, and various aspects of biomedicine, including disease diagnosis and detection.

Types of Repetitive DNA

The eukaryotic nuclear genome is characterized by repetitive DNA elements in the form of sequences of varying length and base composition, either localized to a particular region of the genome or dispersed throughout it (for example, on different chromosomes). Some of these DNA elements are repeated only a few times, whereas others may be repeated millions or billions of times; thus, the percentage of the total genome represented by repetitive DNA varies widely among taxa. Tandem repeats are repetitive DNA sequences that are adjacent to each other in a block or array, whereas interspersed repeats are found throughout the genome, surrounded by unique (nonrepetitive) DNA sequences.

94416669-89530.jpg94416669-89531.jpg

The two major classes of tandem repetitive DNAs (TR-DNAs) are those that are localized to a particular region or regions of the genome and those that are dispersed throughout the genome. TR-DNAs feature repeating units that are oriented in head-to-tail arrays. The repetitive units of an array may be genes, promoters, or intergenic spacers, or they may be simple nucleotide sequences. For example, in the kangaroo rat, the simple sequence AAG is repeated 2.4 billion times.

Localized TR-DNA is often composed of members of multigene families. For example, in humans, there are 350 copies of the ribosomal RNA (rRNA) genes on five different chromosomes that occur as tandemly repeated arrays. Transfer RNA (tRNA) and immunoglobin genes are other examples of multigene families that are tandemly repeated. However, most localized TR-DNA consists of simple, noncoding repetitive DNA sequences that often, but not always, can be found in heterochromatic or centromeric regions.

Dispersed TR-DNA sequences are scattered throughout the genome and can be divided into two major groups: short interspersed elements (SINEs) and long interspersed repeats (LINEs). TR-DNA sequences are sometimes referred to as satellite DNA, classified according to length of the repeated unit and length as a whole as either satellites, minisatellites, or microsatellites.

Origin and Evolution of Dispersed Repetitive DNA Elements

Dispersed repetitive DNA is believed to be an evolutionary device that catalyzes the formation of new genes. Within a species, DNA sequences are thought to maintain similarity by gene conversion (a type of DNA repair mechanism that occurs during meiosis and is believed to maintain the DNA coding sequence of the organism being replicated), while repetitive sequences disrupt this process and allow new genes to evolve. SINEs are believed to disrupt gene conversions between chromosomes, while the longer LINE elements disrupt the gene conversions within the chromosome. SINEs are transposable elements, capable of “jumping” from one locus to another via an RNA intermediate.

SINEs are called nonviral retropseudogenes because they are derived from mRNA transcripts of DNA sequences. The RNA transcript is reverse transcribed into DNA and then inserted into the genome. Although SINEs resemble the genes from which they derived, they no longer function properly.

The most common and best-characterized SINEs in humans are the highly repetitive Alu sequences, so named because they are cleaved multiple times by the restriction endonuclease, derived from the bacterium Arthrobacter luteus. An estimated one million Alu sequence copies are scattered across the human genome, each one approximately 300 base pairs (bp) in length, constituting as much as 10 percent of the human genome. These Alu repeat sequences may have developed by retrotransposition of the 7SL RNA gene sequence. They include an III sequence that is sometimes transcribed.

LINEs are derived from a viral ancestor and are also capable of transposition. The most common LINE elements in humans, and the only type that still actively transposes, are designated LINE-1, or L1, and make up around 17 percent of the human genome. Full-length, functional (that is, transpositionally competent) L1 elements are approximately 6 kb in length, but most copies of L1 are truncated at the 5′ end and incapable of moving. Full-length L1 copies contain two protein-coding regions, or open reading frames (ORFs): ORF-1 and ORF-2. ORF-1 encodes an RNA-binding protein, and ORF-2 codes for reverse transcriptase. The transposition of LINE elements into various parts of the genome is believed to contribute to evolution through the creation of new genes with altered ORFs.

Are Interspersed Repeated Elements “Junk” DNA?

Repeated DNA elements were once believed to be “selfish” or “junk” DNA, concerned only with their own proliferation within the host cell’s genome. Recent studies, however, reveal that repetitive elements interact with the genome with profound evolutionary consequences. For example, satellite DNA found near the may play a role in assembling and fusing chromosomal microtubules during cell division. In addition, transposable genetic elements such as SINEs, LINEs, and Alu sequences may have played a significant role in the evolution of particular proteins. For example, Alu elements flanking the primordial human growth hormone gene are believed to be responsible for the evolution the chorionic somatomammotropin hormone 1 (CSH1) gene. Transposable repeated elements may have contributed substantially to the origin of new gene functions by initiating a copy of an existing gene (which, over time, may acquire a different function) or by creating a “composite” gene composed of domains from two or more previously unrelated genes.

Classification of Simple Tandem Repeats

TR-DNA sequences are classified into four major groups based on three characteristics: the number of nucleotides in the repetitive unit, the number of times the unit is repeated, and whether the element is localized or scattered across the genome. The four groups include satellites, minisatellites, microsatellites, and dispersed Alu sequences. Satellite DNA is composed of tandemly repeated basic DNA sequences, ranging from two to hundreds of nucleotides in length and repeated more than one thousand times, locally, in the DNA. Satellite DNA represents an example of a localized simple repeat typically found in the centromeric region of a chromosome. Tandemly repeated basic DNA sequences, ranging from nine to one hundred nucleotides in length, repeated ten to one hundred times, and scattered throughout the genome are known as minisatellites. Microsatellites are also dispersed repetitive sequence elements; however, microsatellites are composed of short DNA sequence repeats of a basic unit one to six nucleotides in length that are tandemly repeated ten to one hundred times at each locus. The most common microsatellite loci in humans are dinucleotide arrays of (CA)N. However, on average, at least one tri- or tetranucleotide microsatellite locus is found in each 10 kilobase pairs (kb) of human genomic DNA. In a separate group, the basic unit of dispersed Alu sequences is one to five nucleotides in length, and this unit is repeated ten to forty times per locus.

Polymorphism at Loci Composed of Simple Tandem Repeats

For purposes of convenience, the four groups of simple tandem repeats discussed above (satellite DNA, minisatellites, microsatellites, and Alu sequences) are sometimes collectively referred to as variable number tandem repeats (VNTRs). Separate loci are thought of as alleles; therefore, in humans, each VNTR locus will be represented by two alleles, one paternal and the other maternally inherited. All VNTR loci exhibit high rates of mutation. For these reasons, VNTR loci are highly polymorphic, meaning that a large number of alleles exist at any given locus. This can be assayed using laboratory techniques such as polymerase chain reaction (PCR) or Southern blotting to examine the differences in the lengths of the alleles (repetitive elements) at a particular DNA locus. Length differences at VNTR loci arise as a result of mispairing of repeats during replication, mitosis, or meiosis, theoretically resulting in the loss or gain of one to many of the repeat units. Empirical studies and computer-based modeling experiments have demonstrated that each mutation usually increases or decreases the number of repeated units of an allele in a “one-step” manner. In other words, most mutations result in the loss or gain of only one repeated unit.

The multiallelic variation arising through this variation in repeat copy number provides useful genetic markers for many different applications. For example, under conditions of random mating and because of high mutation rates at VNTR loci, most individuals within the human population are at any selected VNTR locus. This observation directly led to the origin of DNA fingerprinting (or DNA profiling), which is now considered admissible forensic evidence in many judicial systems worldwide. Length variation of VNTRs creates a powerful tool for identity analysis (for example, paternity testing) and is routinely used by population geneticists to examine among populations. In the fields of genomics and biomedicine, VNTR loci are useful genetic landmarks for mapping the location of other genes of interest, including genes with a particular function or those implicated in disease.

Transposable Elements and Human Disease

Retrotranspositions of LINEs and SINEs into coding or noncoding genomic DNAs represent major insertional mutations. The effects of such insertions vary but are usually deleterious, leading to debilitating human diseases. Among a growing list of diseases known in some cases to be caused by the insertion of LINEs or SINEs are Duchenne muscular dystrophy, Glanzmann thrombasthenia, hemophilia, hypercholesterolemia, neurofibromatosis, Sandhoff disease, and Tay-Sachs disease. Translocation of repeated sequences has also been demonstrated to “turn on” tumorogenic oncogenes; for example, one type of colon cancer is associated with the insertion of repetitive DNA.

Other studies suggest that “unstable” VNTRs, including minisatellite, microsatellite, and Alu loci, can also cause disease. These studies suggest that a threshold number of repeats of the basic nucleotide unit may exist that can be accommodated at a given locus. When this threshold number of repeats is exceeded by overamplification of the basic repeated unit, serious diseases may arise. Among the diseases attributed to overamplification of tandem repeats of simple DNA sequences are fragile X syndrome and Huntington’s disease.

Key Terms

  • allelean alternative form of a gene located at a specific location or locus on a chromosome
  • nucleotidea basic unit of DNA, consisting of a five-carbon sugar, a nitrogen-containing base, and a phosphate group
  • polymorphismthe existence of different alleles for a particular gene locus among individuals of the same species
  • retrotranspositionthe movement of repetitive DNA sequences within a genome via RNA intermediates
  • tandem repetitive DNA (TR-DNA)nucleotide sequences that include repeating sections oriented in head-to-tail arrays, such as (GGAAT)n
  • transposona DNA segment that is able to replicate itself and insert the new DNA sequence either in a new location within the same chromosome or into another chromosome or plasmid
  • variable number tandem repeat (VNTR)a DNA sequence in which short nucleotide sequences are repeated over and over; chromosomes from different individuals frequently have different numbers of the basic repeat

Bibliography

Korf, Bruce R., and Mira B. Irons. Human Genetics and Genomics. 4th ed., Wiley, 2013.

Lander, Eric S., et al. “Initial Sequencing and Analysis of the Human Genome.” Nature, 2001, pp. 860–921.

Li, Wen-Hsiung. Molecular Evolution. Sinauer, 1997.

Liao, Xingyu, et al. “Repetitive DNA Sequence Detection and Its Role in the Human Genome." Communications Biology, vol. 6, no. 954, 2023, doi:10.1038/s42003-023-05322-y. Accessed 5 Sept. 2024.

Maichele, Andrea J., et al. “A B2 Repeat Insertion Generates Alternate Structures of the Mouse Muscle γ-Phosphorylase Kinase Gene.” Genomics, vol. 16, no. 1, 1993, pp. 139–49.

Maraia, Richard J., editor. The Impact of Short Interspersed Elements (SINEs) on the Host Genome. Landes, 1995.

Meisenberg, Gerhard, and William H. Simmons. Principles of Medical Biochemistry. 3rd ed., Saunders, 2012.

Shapiro, James A. “A 21st Century View of Evolution: Genome System Architecture, Repetitive DNA, and Natural Genetic Engineering.” Gene, vol. 345, no. 1, 2005, pp. 91–100.

Shapiro, James A., and Richard von Sternberg. “Why Repetitive DNA Is Essential to Genome Function.” Biological Reviews, vol. 80, no. 2, 2005, pp. 227–50.

Strachan, Tom, and Andrew P. Read. Human Molecular Genetics. 4th ed., Garland, 2011.