Data compression
Data compression is the technique of reducing the amount of space required to store data, making it more manageable for storage and transmission. This process is essential in digital environments, particularly as file sizes for images, audio, and video can be quite large. There are two primary types of compression: lossless and lossy. Lossless compression retains the original quality of the file, making it suitable for applications where precision is crucial, such as graphic design and photography. Formats like PNG and GIF exemplify lossless compression.
On the other hand, lossy compression sacrifices some quality for significantly smaller file sizes, which is beneficial when speed and storage capacity are priorities. Common lossy formats include JPEG for images and MP3 for audio. A key aspect of data compression is the use of algorithms, which can identify and eliminate redundant data to streamline file storage. While advancements continue in compression technology, achieving perfect lossless compression is widely considered impossible. Overall, data compression plays a vital role in enabling efficient data management and transfer in our increasingly digital world.
On this Page
Subject Terms
Data compression
Data compression is the process of storing information in a way that requires less space. In this sense, compression refers to the act of shrinking a file to make it easier to store. Most image files found on the Internet are compressed. Compression causes the loss of quality over time; however, the amount of information lost in the compression process varies with the file format used. Compression algorithms (procedures used to compress files) include both lossless and lossy formats. Lossless algorithms produce a larger file that does not degrade much over time. Lossy algorithms produce an incredibly small file, but it degrades each time a person opens and saves the file. Some compression formats include JPG, PNG, MP3, and MP4.
Background
Data compression usually refers to various means of formatting a computer file. People sought to transmit data in more convenient forms long before the advent of the computer. A commonly cited example of data compression is Morse code. Samuel Morse developed a form of communication called Morse code in 1836. At the time, engineers could not devise a way to transmit the sound of a person's voice over long distances. However, they could send short electric signals through wires that traveled as far as technicians could lay an electrical line. Thus, Morse created a code that reduced the alphabet to a series of short and long electrical signals. The use of this code allowed telegraph operators to compress human language to a binary format of dits and dahs (dots and dashes). The telegraph operator at the other end could then restore the data—the message—to its full form.
The invention of the Internet furthered the needed for data compression. First developed by the US Defense Advanced Research Projects Agency (DARPA) in the late 1960s as a network for government computers, the early Internet ran on low-speed telephone lines. At the time, it was difficult to share image, audio, or video files. The files were simply too large to transfer over the slow connection. Unfortunately, once computers and Internet connections began to speed up due to advanced technology, transfer of large files remained difficult. To remedy this situation, programmers designed several new file formats. These formats automatically compressed files, making them smaller and easier to transfer. The computer that received the file could use the coding in the compressed files to replace any removed code, restoring the file to its original form. Most images, videos, and music online in the twenty-first century use some form of data compression.
Overview
Two primary types of compression are used in modern computer programming: lossless and lossy. While no form of compression is truly lossless, lossless file formats show no noticeable degrading in the file over time. They include PNG and GIF formats. Although lossless compression formats result in a much larger file than their lossy counterparts, they are useful for formats in which the design must remain intact. For example, if photographers or graphic designers need to electronically send an image to a magazine to be printed, they need to send it in a way so that the image does not lose quality. They need the image to remain perfect but be contained in a manageable file size, so they should use a lossless compression format.
Most lossless formats work by breaking down repetitive elements in a file into a code and then removing them. For example, a lossless file format might record where every instance of the same shade of yellow is placed in a picture and then remove it. The receiving computer looks at that record, finds the shade of yellow specified, and fills it in at the noted locations. Because the record is exact, this results in the exact same picture every time. While the record might reach a relatively large file size, it will still be much smaller than the original image.
Lossy file formats are used when transfer speed is more important than overall photo quality. Unlike lossless formats, files saved in lossy formats will gradually reduce in quality. This is because the compression algorithms used by lossy formats are imperfect. They do not record the exact data present in a file. Instead, they use algorithms that make a best guess, usually getting very close to the original. With many file formats, such as a random image on the Internet or an MP3 sound file, this distortion is not a serious problem. However, if a file is transferred a large number of times, such as an image file saved and opened over and over, the file will start to break down over time. Colors will fade, sounds will go missing, and the algorithm attempting to restore the missing parts will provide an increasingly inaccurate representation of the original file.
While this is problematic for artistic or graphic design purposes, lossy compression is not a serious concern for most computer users. The drastically faster loading times associated with lossy compression make up for the reduced file quality. Lossy compression is what allows many handheld devices, such as smartphones and tablets, to store hundreds of hours of video or thousands of songs.
Computer programmers continue to devise ways to improve compression algorithms. New compressed file formats consistently outperform older compression formats. Many newer compression algorithms are not designed to follow a particular pattern when compressing a file. Instead, they use a variety of tools to recognize patterns in the file itself, record those patterns, and remove them. Unfortunately, most computer experts believe that perfect lossless compression is mathematically impossible. Despite this, computer programmers continue to develop advanced compression algorithms.
Bibliography
Blelloch, Guy E. "Introduction to Data Compression." Carnegie Mellon University School of Computer Science, 31 Jan. 2013, www.cs.cmu.edu/~guyb/realworld/compression.pdf. Accessed 4 May 2017.
"Encoding Images." BBC Bitesize, www.bbc.co.uk/education/guides/zqyrq6f/revision. Accessed 4 May 2017.
"History [of Data Compression]." Wolfram Science, 2002, www.wolframscience.com/reference/notes/1069b. Accessed 4 May 2017.
"How Data Compression Works." NetApp, library.netapp.com/ecmdocs/ECMP1196986/html/GUID-C108118E-8969-4BF2-B0FA-06A4CAAEE186.html. Accessed 4 May 2017.
Lelewer, Debra A., and Daniel S. Hirschberg. "Data Compression." University of California Donald Bren School of Information & Computer Sciences, www.ics.uci.edu/~dan/pubs/DataCompression.html. Accessed 4 May 2017.
"Morse Code History." White River Valley Museum and the Mary Olson Farm, www.wrvmuseum.org/morsecodehistory.htm. Accessed 4 May 2017.
Pot, Justin. "How Does File Compression Work?" MakeUseOf, 11 Oct. 2012, www.makeuseof.com/tag/how-does-file-compression-work. Accessed 4 May 2017.
"What Is Data Compression?" BitMagic, bmagic.sourceforge.net/compress101.html. Accessed 4 May 2017.