Nucleic acid sequence
Nucleic acid sequences are the specific arrangements of nucleobases—adenine, thymine (or uracil in RNA), cytosine, and guanine—within the DNA or RNA molecules that govern genetic information in living organisms. The structure of nucleic acids is complex, with DNA typically existing as a double helix formed by two strands of nucleotides. Each nucleotide consists of a phosphate group, a sugar, and a nitrogenous base, and the sequence of these bases determines an organism's traits, such as height and eye color. The study of nucleic acid sequences has advanced significantly since the discovery of DNA's structure in 1953, culminating in projects like the Human Genome Project, which mapped the entire human genome. Nucleic acid sequences are crucial for protein synthesis, involving processes of transcription and translation, whereby RNA copies genetic information from DNA and guides the creation of proteins. Mutations in these sequences can lead to incorrect protein formation, potentially resulting in various genetic disorders. Understanding nucleic acid sequences is fundamental to genetics, molecular biology, and the study of heredity.
On this Page
Subject Terms
Nucleic acid sequence
Nucleic acids are the molecules within cells that are responsible for relaying genetic information. Undoubtedly, the most well-known nucleic acids are deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). The structure of a nucleic acid is very complex, more so than any other type of molecule within biological systems.


A nucleic acid sequence is simply the order in which the nucleobases that make up nucleic acids—adenine, thymine (or uracil), cytosine, and guanine—are arranged within the DNA or RNA molecule. Adenine (A) always pairs with thymine (T), or uracil (U) in RNA, and cytosine (C) always pairs with guanine (G). The order of these bases determines an organism’s physical characteristics, including their height, eye color, skin tone, and sex.
Brief History
The structure of the DNA molecule was discovered in 1953. Scientists James D. Watson and Francis Crick, using data and x-ray images provided by Rosalind Franklin, figured out that DNA existed in a double-helix formation, but several decades passed before a nucleic acid sequence could be reliably studied in a laboratory. It was not until 1968 that the first DNA sequencing was successfully performed by molecular biologists R. Wu and A. D. Kaiser, and modern methods derive from those developed in 1977 by F. Sanger, S. Nicklen, and A. R. Coulson and by Allan M. Maxam and Walter Gilbert.
In 1990, the Human Genome Project, which was the scientific community’s first attempt at sequencing the entire human genome, officially began. Although initially expected to take at least fifteen years, the project was completed in 2003. The research team concluded that the human genome contained about three billion base pairs and about twenty-five thousand genes, far fewer than previously believed. Even small organisms such as bacteria have several hundred million base pairs in their genome.
Overview
Before one can understand how a DNA sequence works, it helps to know how nucleic acids are formed. All nucleic acids are composed of nucleotides. A standard nucleotide has three basic components: a phosphate group, a pentose sugar, and a nitrogenous base, or nucleobase. The nucleobases are guanine (G), cytosine (C), thymine (T), uracil (U, taking the place of thymine in RNA), and adenine (A). Note that uracil is only used during transcription, when RNA molecules copy the genetic code from DNA molecules.
When enough nucleotides are strung together, they form a nucleic acid. As two strands of nucleotides are linked together in a double-helix shape, the phosphate group on one nucleotide connects to a carbon atom on a different one. The process is repeated all the way down the sequence of nucleotides, creating a polynucleotide strand. The order in which the nucleotides face one another within the chain is called the sequence.
Each strand is connected together through very specific base-pairing rules. Adenine always pairs with thymine, and guanine always pairs with cytosine; there are no exceptions to this. In RNA, thymine is replaced with uracil, which also links with adenine. Hydrogen bonding holds these bases together, adding support and structure to the double helix.
There are two main categories of nucleobases in nucleic acids, the pyrimidines and the purines. Uracil, thymine, and cytosine are pyrimidines, while adenine and guanine are purines. As demonstrated by the pairing behavior of bases within the nucleic acid molecules, a pyrimidine must always be paired with a purine, without exception. This rule is a function of molecular size. Purine molecules are larger than pyrimidine molecules because they contain an additional carbon ring. Were two purines to pair together in a nucleic acid sequence, the two DNA strands would not be close enough together for hydrogen bonds to form. Once linked, DNA molecules are packed and twisted very tightly, bringing the base pairs closer together as a result, which helps provide a more stable structure for the DNA molecule.
Within living organisms, nucleic acid sequences code for many different things, including color and, to some degree, the rate at which the organism ages. Genes within DNA molecules do this by coding for specific proteins. In all cells, proteins are responsible for such functions as breaking down nutrients, transporting hormone molecules, catalyzing chemical reactions, and even building more proteins. If a nucleic acid sequence develops a mutation, then certain proteins are not created or are created improperly, and the cell or body system is unable to function normally. Genetic mutations known as "missense mutations" occur whenever the wrong nucleotide appears in a sequence, causing an incorrect amino acid to be introduced into the protein. Such mutations can be deadly, causing diseased cells to form or failing to produce proteins essential for life.
Before a nucleic acid sequence can be used as a blueprint for building a protein, it first needs to be transcribed by RNA molecules, which are the nucleic acids responsible for copying the genetic code found in DNA. The RNA subsequently relays the genetic code to ribosomes within the cell, which then translate the code and create a protein. RNA molecules do not copy the entire genetic code during transcription, as the process would take too long. Instead, they only transcribe for the part of the code—the gene—that the cell needs at that time. Both translation and transcription are considered central facets of cellular and molecular biology.
Bibliography
Bhardwaj, Uma, and Ravindra Bhardwaj. Biochemistry for Nurses. New Delhi: Dorling, 2012. Print.
"A Brief Guide to Genomics." National Human Genome Research Institute. Natl. Insts. of Health, 14 Apr. 2014. Web. 18 Aug. 2014.
Dorado, Gabriel, et. al. "Analyzing Modern Biomolecules: The Revolution of Nucleic-Acid Sequencing – Review." Biomolecules, 28 July 2021, doi.org/10.3390/biom11081111. Accessed 28 Dec. 2022.
Hutchison, Clyde A., III. "DNA Sequencing: Bench to Bedside and Beyond." Nucleic Acids Research 35.18 (2007): 6227–37. Print.
Muñoz, Victor, ed. Protein Folding, Misfolding and Aggregation: Classical Themes and Novel Approaches. Cambridge: Royal Soc. of Chemistry, 2008. Print.
"Nucleic Acids." Chemistry for Biologists. Royal Soc. of Chemistry, n.d. Web. 18 Aug. 2014.
Ochoa, George. Science 101: Biology. New York: Hydra, 2007. Print.
Papachristodoulou, Despo, et al. Biochemistry & Molecular Biology. 5th ed. Oxford: Oxford UP, 2014. Print.
Wanjie, Anne. The Basics of Biology. New York: Rosen, 2014. Print.