Digital Storage

Summary: Information can be stored digitally—a process that requires information to be translated into binary code.

Digital information is information in binary code. In order to create, manipulate, and store this digital code, it must be created in physical form. This creation is done by using media that can exist in one of two distinct states and assigning one state to each of the two digits (“0” and “1”) in binary code. Within a computer, the “1”s and “0”s are represented as “ons” and “offs”; on a magnetic hard disk, they are tiny magnets pointing one way or another; and on a CD, the two states are shiny and dull spots. Engineers used metal tape on reel-to-reel machinery to record audio signals in the early twentieth century. In 1952, IBM introduced a tape drive with iron oxide–coated plastic tape. Reel-to-reel tape drives were the standard for data storage by the mid-1970s. IBM also created magnetic hard disks in the late 1950s, but it took decades to overcome size and access speed issues to make hard disk drives (HDDs) feasible for applications like personal computers. Solid-state drive (SSD) technology, such as flash memory, was the necessary next step to overcoming the lagging mechanical speeds of HDDs. Mathematicians in many fields have been essential in all stages of development and continue to address emerging issues. Ingrid Daubechies, “the mother of wavelets,” is perhaps best known for her work with wavelet-based algorithms for compressing digital images. Irving Reed and Gustave Solomon developed algebraic error-detecting and error-correcting codes. These Reed–Solomon codes are widely used in digital storage and communication, from satellites to CDs.

94981795-91324.jpg

Bits and Bytes

The smallest unit of stored digital information, corresponding to a single “1” or “0,” is called a “bit.” The term “bit,” a contraction of “binary digit,” is commonly attributed to statistician John Tukey, working in conjunction with mathematician John von Neumann. Bits are collected into 8-unit chunks called “bytes,” and these collections of 8 bits can represent various types of information. The lowercase letter “a,” for instance, can be represented as 01100001, and “b” as 01100010. The music on a compact disc is encoded as a set of 44,100 reading (or samples) per second, with each reading represented by 2 bytes containing 16 bits.

Storage Size

Sizes of files, and the capacity of storage devices, are often referred to as multiples of the byte. A kilobyte (KB) is approximately 1000 bytes, enough information to store about 150 words, or about half a page of text from a paperback book. As larger units are used, the naming system employs other metric prefixes, with each step up representing a multiple of either 1000 or 1024, depending on the device. Thus, a megabyte (MB) is approximately 1000 KB, and a gigabyte (GB) is approximately 1000 MB. Units beyond the gigabyte include the terabyte (TB), petabyte (PB), and exabyte (EB).

Magnetic Storage

Since grains in a magnetic medium can be magnetized with the north pole pointed in either of two directions, magnetism is an ideal medium for representing binary information. In addition, since information stored in this way is relatively stable, it is useful for long-term storage. Finally, since this magnetism can be reset easily using an electromagnet, magnetic media are easy to erase and rewrite.

A magnetic hard disk employs one or more spinning platters coated in a magnetic medium. An arm with tiny electromagnetic heads floats over the surface of the disk and is used to magnetize regions of the disk corresponding to the “1”s and “0”s of binary code. To retrieve information, the disk spins past the heads, generating current that corresponds to the code stored on the disk. While the principle is straightforward, it has been a remarkable feat of engineering to create disks that spin up to 7200 revolutions per minute with arms that can travel across the surface of a platter 50 or more times per second as they seek and write information. Even so, writing and retrieval speeds have not increased over time at the same exponential rate as the amount of information that can be stored on such disks, resulting in undesirable lags.

Even in the early twenty-first century, long-term backup of computer information is often done on low-cost magnetic tape, with bits of information laid down as magnetic regions on moving tape. However, since the information is laid down on a long piece of tape, there can be no random access of information, limiting its usefulness in everyday applications. Until recently, digital camcorders used magnetic tape to record video; however, the desire to have random access of footage and recent advances in hard drive and other storage techniques have brought on a new generation of tapeless camcorders.

CDs, DVDs, and Flash Memory

Both CD and DVD players are optical devices that use lasers to read the shiny and dull spots encoded on a plastic disk. Information is recorded by burning non-reflective pits into the surface of the disk to represent “0”s and leaving the reflective surface to represent “1”s. When the disk is played, it spins past a laser. When the light encounters a pit, it is not reflected, and the player registers an “off” signal (“0”), and when the light bounces back off a shiny region, the player registers an “on” signal (“1”). This information is interpreted by a small computer in the player.

Many devices, including digital cameras, camcorders, video game consoles, and cell phones, use flash memory, which can store large amounts of information on small cards that have no moving parts. This technology employs an array of microscopic transistors through which current may pass. Whether this current passes through or not is controlled by what is called a “floating gate,” and the path through the transistor can be electrically opened or closed. This method allows the transistor to have the two states needed for binary code. Sections of flash memory can easily be reset (erased) by flushing out the electrons trapped in the floating gate. One of the primary benefits of this technology is that information can be stored on a card with no moving parts, improving both access speed and portability.

Data Rot and Error Correction

Tape, hard disks, CDs, and flash memory store and retrieve information accurately most of the time, but they are not problem-free. Errors and noise can happen in an electromechanical recording system—“1”s that should have been “0”s, and vice versa—which diminish information accuracy. Mathematical methods are used to check for and correct errors. For example, cyclic redundancy check (CRC) coding algorithms calculate a fixed-length binary sequence (code) for each block of data using polynomial division in a finite field. The codes and data blocks are stored together, and they can be checked after transmission or retrieval. CRC was invented by mathematician W. Wesley Peterson, who also devised many error-correcting codes.

94981795-29638.jpg

Even if the recording is perfect, the media that hold binary code can degrade in a variety of ways over time. For instance, magnetic media can lose their magnetic orientation, especially if they are subjected to a strong magnetic field. In addition, the substrates on which the magnetism is stored—the platters on hard drives and plastic backing on magnetic tape—will invariably degrade over time. Even the plastic on CDs and DVDs will begin to break down, and flash memory floating gates will ultimately leak the electrons that maintain data in their flash memory transistor states. Even if the storage media and binary information survive over time, there is a real chance that in the future there may not be hardware available to read information encoded in an outdated media.

Bibliography

Somasundaram, G., and Alok Shrivastava. Information Storage and Management. Hoboken, NJ: Wiley, 2009.

Wicker, Stephen. Error Control Systems for Digital Communication and Storage. Englewood Cliffs, NJ: Prentice Hall, 1994.