Mathematical history of the Internet
The "Mathematical History of the Internet" examines the foundational role mathematics has played in the development and functionality of the Internet. The Internet connects various networks, facilitating communications through protocols like TCP/IP, established in the 1970s by Vinton Cerf and Robert Kahn. This evolution was bolstered by pioneering work from mathematicians and computer scientists, including the development of packet switching to efficiently transmit data. The emergence of the World Wide Web in the early 1990s marked a significant milestone in Internet accessibility, driven by the adoption of common protocols. Mathematical concepts also underpin modern challenges, such as optimizing search algorithms and traffic routing, which are essential for handling the Internet's rapid growth and complexity. Additionally, network science, rooted in graph theory, plays a crucial role in analyzing connectivity and vulnerabilities within the Internet. Overall, the intertwining of mathematics and Internet technology continues to shape how we interact with digital spaces today.
Mathematical history of the Internet
Summary: Many properties and problems of the Internet are studied and modeled using mathematics.
The Internet is a worldwide computer network connecting other computer networks in government, business, academia, and other public and private sources. Communications are facilitated by the Internet Protocol Suite (TCP/IP), originally proposed by Vinton Cerf and Robert Kahn in 1974. The Internet is used for implementing various applications including electronic mail, pioneered in the late 1960s, and the World Wide Web (WWW) of linkable documents. The idea of networks connecting information nodes appeared in futuristic scientific writings and science fiction beginning in the early twentieth century.
The work of mathematicians, computer scientists, cyberneticists, and many other scientists contributed to the emergence of the Internet and the World Wide Web by the end of the twentieth century. Researchers and teachers in nearly every discipline use the Internet to further their work, and many study the properties of the Internet itself using mathematics. One problem explored by mathematicians and computer scientists is mapping the Internet, often undertaken to understand the nature of connections and to reduce stress on routers. The field of hyperbolic geometry has proven to be highly useful in creating such maps, especially with regard to assessing global stability and developing efficient routing methods. Mathematicians also consider the theoretical and computational challenges posed by the massive graphs that result from Internet mapping, which test the limits of even the largest and fastest computers. Others examine society’s increasing dependence on the Internet for a range of critical everyday tasks (like banking and medical recordkeeping) along with the risks and vulnerabilities (like identity theft) that this reliance may create.
Codevelopment of Mathematical Sciences and the Internet
Mathematicians including John Von Neumann, Alan Turing, and Norbert Wiener contributed to the development of both the hardware and the software necessary to implement computer networks and the Internet. The precursors of the Internet were networks such as the telegraph, telephone, radio, and television. Even early electronic computers had systems for data input, computation, and output. In the late 1960s, individual computer “nodes” were connected to one another, building on the technology for connecting subsystems within the same computer. These early stages of building computer networks promoted the development of the mathematics-rich fields of cybernetics, informatics, and artificial intelligence.
Mainframe computers enabled countless historical achievements and facilitated research and problem-solving in mathematical fields such as cryptography, simulation, and genetics. In the late 1970s and early 1980s, the introduction of the first personal computers changed the face of computing by creating applications and giving access to new groups of users. In the 1980s, the National Science Foundation (NSF) funded five supercomputer centers connected by NSFNET, which built on Computer Science Net (CSNET) and the Department of Defense’s Advanced Research Projects Agency Network (ARPANET). Demand during the first year was so great that the system had to be upgraded almost immediately, and uses for the new network continued to expand, as did the mathematics research needed to meet user demands for functionality. At the same time, national computer networks such as ARPANET and NSFNet, the Japanese JUNET, and the European CERN remained isolated from one another. The big challenge at the time was to make these separate networks compatible and interoperable.
Adoption of several dozen international protocols, such as the TCP/IP for the Internet, facilitated interlinking. In the early 1990s, the idea of common protocols enabled the system of file hosting, accessible by anyone at all times and called the World Wide Web. The explosive evolution of the Internet and the Web in the next decade is well documented. In the United States, efforts were aided by several pieces of legislation. For example, the High Performance Computing Act (HPCA) of 1991 reset priorities for computing research and education. President Bill Clinton stated that he believed such legislation enabled collaborations “critical for assuring American prosperity, national and economic security, and international competitiveness in the twenty-first century.” Computer scientists Eric Bina and Marc Andreessen developed the first widely used graphical browser, Mosaic, released in 1993 and funded by a program associated with the HPCA. Tim Berners-Lee, the creator of several WWW protocols, was knighted in 2004 by Queen Elizabeth II “for the invention of the World Wide Web.”
Mathematical Problems
One mathematical problem that had to be solved in order to build computer networks was packet switching, which is grouping data of all types into blocks known as “packets” of size that are appropriate for network transmission. Network nodes or routers have algorithms that decide how to queue, buffer, and deliver individual packets as a function of network traffic patterns. This is a different, mathematically more complex model from circuit switching, which was used in older telephone networks to transmit information bits at a constant rate. Computer scientists Paul Baran, Donald Davies, and Leonard Kleinrock pioneered packet switching networks. Baran’s work was shaped in part by Cold War concerns about maintaining communications in the face of nuclear attack. Donald Davies worked with Alan Turing at the National Physical Laboratory and is reputed to have found mistakes in Turing’s groundbreaking paper “On Computable Numbers.” Kleinrock, a recipient of the U.S. National Medal of Science, said of his work, “Basically, what I did for my Ph.D. research… was to establish a mathematical theory of packet networks.”
In the late 1960s, mainframe computers had message systems among their different users, who all had to be online at the same time to communicate. In the early 1970s, the message system software was modified to include new computer networks. The ability to deliver messages to offline users, make different systems compatible, and uniquely identify users were significant research problems. The compatibility issue, still important in the twenty-first century, was resolved in part by creating software and hardware gateways that connect different systems. BITNET was cofounded by Ira Fuchs and Greydon Freeman primarily for research and academic communities, while FidoNet was implemented for personal computers and bulletin board systems by Thomas Jennings. Unique identification of users is a complex mathematical problem, since for any string length there is a finite number of possible letter and symbol permutations.
Similar concepts apply to the study and selection of secure electronic passwords. A system developed in the early 1970s assigned registration codes to domains and then to users within domains in the form “user@domain.” This method and the use of “@” are credited to Raymond Tomlinson. At the start of the twenty-first century, the mathematical structure of domain names is a type of tree, with multiple hierarchical levels. Minimally, there are two levels. Each domain name ends with the top-level domain including generic ones, such as “.com” and “.edu,” and country code ones, such as “.us” or “.uk,” with a period on the left. To the left of that period comes the second level domain name; for example, “wikipedia.org” or “google.com.”
If there are more domain levels, they appear on the left of the second-level domain and are separated by periods as well; for example, “simple.wikipedia.org” or “groups.google.com.” There is no limit to the number of domain levels. This syntax and structure was first published in the 1980s in connection with the Advanced Research Projects Agency Network (ARPANET). IP addresses are the numerical representations of individual computers, mapped to domain names. They consist of four bytes of information displayed as numbers. Each byte has eight bits and can be any integer from 0 to 255. With the exponential growth in Internet users, assigning unique identities to users, domains, and computers continues to be a challenging problem, especially since many users have multiple e-mail and IP addresses. For computer users, off-line message delivery is achieved by storing messages on digital servers until the recipient accesses them.
E-mail programs typically employ the Internet Message Access Protocol (IMAP), developed by Mark Crispin, or the older Post Office Protocol (POP) to retrieve mail. Simple Mail Transfer Protocol (SMTP) is also used for sending and receiving functions. Mathematical algorithms enable the queuing, encryption, authentication, and filtering of e-mail, and mathematicians continue to contribute new developments and improvements. Many agencies are responsible for making assignments and tracking Internet protocols. The Internet Assigned Numbers Authority was headed for nearly 30 years by computer scientist Jonathan Postel, who codeveloped and documented many of the key Internet standards, including SMTP and Domain Name System (DNS) servers.
The Growth of Networks
Other mathematical problems of Internet development sprang from the incredibly fast growth of networks. To compare the rate of growth of different networks, researchers use metrics such as time per number of users. They have determined, for example, that it took only five years for the Internet to reach 50 million users, versus 13 years for television and 38 years for radio. As the number of users and domains grew, search algorithms became a prominent field in computer science and mathematics, with several major developments such as clustering and relevance rankings. There are many search engines, many of which initially used the content of Web pages to rank results. Google’s PageRank method was among the first search protocols to use sophisticated mathematical modeling, including directed graphs and stochastic matrices, to explore links between pages hierarchically. The PageRank algorithm is named for Google cofounder and computer scientist Lawrence Page.
In 2009, Google research scientist Kevin McCurty noted that successful search engines continually improve by employing mathematical methods that quickly find relevant material and eliminate irrelevant factors that can skew results. Along with better ranking schemes, Internet speed is critical in effective searching and content delivery. The original packet switching and data routing problems have become even more complex as the Internet has grown. Mathematicians and computer scientists model Internet traffic flow using many mathematical and statistical techniques, taking into consideration many variables, including the type of content being exchanged. Photos, videos, music, text, e-mail, and online gaming all require different resources. Based on these models, algorithms to optimally route traffic can be designed and implemented, reducing congestion and slowdowns. For example, the traffic load on a given Website’s computers can be reduced by storing some content at other servers that provide more optimal access patterns, a process known as “network caching.”
Some twenty-first century models are starting to use concepts from disciplines like economics, such as equilibrium theory. One example is called “congestion-dependent pricing,” which would route packets depending on users’ willingness to pay more for privileged Internet access during periods of congestion. Given the number of packets in even a small text file, this is a mathematically complex problem that still requires a great deal of research.
A separate set of science problems has to do with hardware and the various means of connecting to the Internet. As of 2010, it is possible to connect to the Internet through both land-line and cell phones, radio, satellites, dedicated fiber-optic lines, and television cables. While similar in many ways, each has a unique set of issues related to speed, security, data transmission, compatibility, and bandwidth, especially when considering that people are connecting to the Internet with many devices other than personal computers. Mathematicians, computer scientists, and others work on both the hardware and the software solutions.
Network Science
Network science predates the Internet, having its root in graph theory. It is interdisciplinary and includes mathematics, engineering, computer science, biological sciences, sociology, and other disciplines interested in studying various types of networks. It flourished with easy availability of empirical data from computer and social networks made possible by the Internet and the high demand for applications in all aspects related to the Internet. Concepts and methods from graph theory, such as centrality, betweenness, and closeness are used to quantify and describe networks. Centrality is a measure of the importance of a node within a network. Betweenness measures the quality of paths through the node, such as the number of shortest paths between pairs of other nodes. Closeness is the topological measure similar to distance, usually defined as the average number of nodes in the shortest path between a given node and all other nodes in a network that connect to it.
Maps of networks help mathematicians and others analyze vulnerabilities, such as critical nodes that lie between many other nodes and whose loss would sever connectivity, and deprecated connections, where use of outmoded software or features affects speed or leaves the users open to attack. In addition to graph theory, hyperbolic geometry adds to Internet mapping by considering geometric coordinates of nodes in space, not simply the map of connections. The added information can then be used to quantify the issue of closeness from a geometric point of view. In graph theory, each node of a network has a degree, which is the number of other nodes connected to it. Degree distribution is a statistical measure showing the probability distribution of various node degrees over the network. Statistical sampling strategies are often used in network research, since the problems and networks examined are typically far too vast for complete data collection.
Economics and the Internet
In the 1990s, many people believed that the Internet would bring about fundamental changes in the landscape of the business world. Starting in the mid-1990s, venture capitalists were investing heavily in new Internet businesses, sometimes called “dot-coms.” During this time, many Internet companies operated at annual losses, expanding in anticipation of future revenues. This worked for relatively few companies, such as Amazon and Google. In 2001, this “dot-com bubble” burst, with many Internet-related businesses declaring bankruptcy.
The promises of the Internet that survived the dot-com bubble became clearer toward the end of the first decade of the twenty-first century. For example, researchers found that in many cases, product popularity obeys a frequency distribution law similar to the degree distribution of network nodes. The majority of customers use a few most popular products, with the majority of products liked by small minorities of customers.
In the early 2000s, several companies realized large profits by reaching these so-called long tails (named after the characteristic shape of the distribution curve) of niche customers and redefined their industries. Apple changed the music industry by selling individual tracks online; Netflix had a similar effect on movie rentals. Mathematical algorithms for determining customer preferences and making recommendations were driven in large part by Internet commerce. Recommender systems use complex relevance metrics, evaluating content such as texts or video based on statistics of past behavior of all users within the system.
These systems use explicit data, such as rank preferences given by users, as well as implicit data, such as actions other similar users have done before. Over time, these systems accumulate large amounts of data and increase the accuracy of their recommendations. Mathematics involved in creation of these algorithms includes statistical analysis and linear algebra for working with matrices defining closeness of users. Illustrating how lucrative good algorithms are from the business perspective, in 2009 the Netflix Prize awarded $1 million to the developers of an improved filtering algorithm for recommending movies.
Bibliography
Abbate, Janet. Inventing the Internet. Cambridge, MA: MIT Press, 2000.
Boguñá, Marián, et al. “Sustaining the Internet with Hyperbolic Mapping.” Nature Communications 1, no. 1 (September 2010).
Churchhouse, R. F. Codes and Ciphers: Julius Caesar, the Enigma, and the Internet. Cambridge, England: Cambridge University Press, 2001.
Dietrich, Brenda, Rakesh Vohra, and Patricia Brick. Mathematics of the Internet: E-Auction and Markets. New York: Springer, 2010.
Langville, Amy, and Carl Meyer. Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton, NJ: Princeton University Press, 2006.
Srikant, Rayadurgam. The Mathematics of Internet Congestion Control. Basel, Switzerland: Birkhauser, 2003.