File sharing and mathematics
File sharing refers to the practice of distributing access to digital files, allowing multiple users to retrieve information from a remote system. It gained prominence with the emergence of peer-to-peer networks, starting in the late 1990s with applications like Napster, which facilitated music sharing. In contrast to downloading, streaming allows users to access content in real-time without storing it. The efficiency of file sharing is influenced by file sizes and network conditions, where compression algorithms play a critical role in speeding up transfers.
Mathematics underpins many aspects of file sharing, from data compression theories formulated by Claude Shannon to the study of network behavior through graph theory. Researchers examine the reliability and security of these systems, focusing on issues like file sharing worms and the economic impacts on artists and retailers. The Bit Torrent protocol represents a significant advancement, enabling users to download file pieces from multiple sources simultaneously, which enhances download speeds and reduces reliance on centralized servers. Overall, mathematical modeling and algorithms are essential for optimizing file sharing networks and improving user experiences.
On this Page
Subject Terms
File sharing and mathematics
Summary: Mathematicians work on developing compression algorithms and resolving security issues to make file downloading and sharing faster and more secure.
The words “downloading” and “uploading” began to enter mainstream usage in the 1970s. Bulletin board systems, a precursor to the Internet, were among the first systems that allowed computer users to access an external system. At the start of the twenty-first century, e-mail was commonly uploaded and downloaded from remote servers. The term “file sharing” came into popular usage later, especially in reference to peer-to-peer file sharing systems, like Napster. File sharing refers to providing multiple users access to digitally stored information, usually from a remote system. Streaming differs from downloading, since data that is streamed is not stored but used as soon as it is accessed. The amount of time that is required to upload or download a file is, in part, a function of its size. Compression algorithms make data faster and easier to transfer. Mathematician Claude Shannon formulated a theory of data compression in the late 1940s using concepts from entropy and probability, including theoretical limits on lossless and lossy compression that depended, in part, on a function expressing the allowable distortion error. This theory is also known as “source coding theory.” Mathematicians work on reliability and security issues, such as detecting and preventing file sharing worms. Mathematical models of file sharing systems created using techniques from areas such as graph theory and statistics help study connections, patterns, and probabilities. The Gfarm Grid File System was developed in the early twenty-first century as a federated and scalable virtual file system designed to facilitate the high performance petascale-level computing and data mining problems, such as those that result from theoretical particle physics. Mathematicians Duncan Watts and Steve Strogatz made mathematical connections between the behavior of network nodes using look up protocols and human participants in Stanley Milgram’s experiments on the small world phenomenon.
In the early twenty-first century, the term “file sharing” is sometimes used specifically with reference to the illegal proliferation of copyrighted material, which may be attributed, in part, to widespread publicity about this issue. There are several important variables related to the prominence or frequency of illegal file sharing: the availability of Internet access; the growth of typical Internet connection speeds; the development of new file formats that resulted in smaller sizes for high-quality music files; and peer-to-peer file sharing systems. Napster, released in 1999, was the first widely used peer-to-peer file sharing system. It was developed by Shawn Fanning and enabled mostly anonymous sharing of music files with other users through a centralized server, including a search function to locate songs. Though it was shut down by court order only two years later, half a dozen similar programs had been released in that time and the Bit Torrent client was released shortly thereafter. Napster was purchased by Best Buy in 2008 and is now a pay service. Mathematicians research topology and traffic in distributed networks, like Napster and Gnutella, with methods from graph theory and scheduling algorithms, among other tools. They are often seen as advantageous because they reduce or eliminate reliance on centralized servers. These highly connected network nodes are often critical failure points. They also use statistical methods and other types of mathematical modeling to study the economic impacts of peer-to-peer file sharing on retailers and artists as well as user behaviors with regard to their willingness to pay for digital music or movies.
The Bit Torrent client was nearly as large a step forward in file sharing as Napster had been, because it was not a service but a protocol, or a method of sharing files, and is not exclusive to sharing music files. The essential innovation of Bit Torrent, developed by Bram Cohen, was that file seekers were connected to many peers at once, instead of just a single peer. Pieces of the file are simultaneously downloaded and then reassembled on the user’s computer. Furthermore, all peers downloading the file were capable of sharing the pieces they have, even before they have the complete file. A complete copy of a file is called a “seed.” There must be at least one seed involved for downloads to successfully complete. Once a more-popular file has propagated many locations, the network of peers broadens, increasing the piecewise download speed. Unlike Napster, the Bit Torrent protocol does not utilize a central server, making it difficult to detect downloading, though servers called “Bit Torrent trackers” are the targets of law enforcement. Random ports also help users avoid detection. Mathematical methods, such as stochastic differential equations, have been used to model network environments and peer behavior and mathematically based peer-to-peer simulators can be used to evaluate and test new algorithms and solutions before they are implemented.
Bibliography
Caviglione, Luca. File-Sharing Applications Engineering. Hauppauge, New York: Nova Science Publishers, 2009.
Shen, Xuemin, et al. Handbook of Peer-to-Peer Networking. New York: Springer, 2009.