Parallel Computing

Summary

Parallel computing involves the execution of two or more instruction streams at the same time. Parallel computing takes place at the processor level when multiple threads are used in a multi-core processor or when a pipelined processor computes a stream of numbers, at the node level when multiple processors are used in a single node, and at the computer level when multiple nodes are used in a single computer. Many memory models are used in parallel computing, including shared cache memory for threads, shared main memory for symmetric multiprocessing (SMP), and distributed memory for grid computing. Parallel computing is often done on supercomputers capable of achieving processing speeds of more than 106 Gflop/s.

Definition and Basic Principles

Several parallel computing architectures have been used, including pipelined processors, specialized SIMD machines, and general MIMD computers. Most parallel computing is done on hybrid supercomputers that combine features from several basic architectures. The best way to define parallel processing in detail is to explain how a program executes on a typical hybrid parallel computer.

89250544-78488.jpg

To perform parallel computing, one must write a program that executes in parallel. It executes more than one instruction simultaneously. At the highest level, this is accomplished by decomposing the program into parts that execute on separate computers and exchanging data over a network. For grid computing, the control program distributes separate programs to individual computers, each using its data, and collects the computation’s result. For in-house computing, the control program distributes separate programs to each node (the node is a full computer) of the same parallel computer and, again, collects the results. Some control programs manage parallelization themselves, while others let the operating system automatically manage parallelization.

The program on each computer of a grid, or node of a parallel computer, is itself a parallel program, using the processors of the grid computer, or node, for parallel processing over a high-speed network or bus. As with the main control program, parallelization at the node can be under programmer or operating system control. The main difference between control program parallelization at the top and node levels is that data can be moved more quickly around one node than between nodes.

The finest level of parallelization in parallel programming comes at the processor level (for both grid and in-house parallel programming). If the processor uses pipelining, then streams of numbers are processed in parallel, using components of the arithmetic units. If the processor is multi-core, the program decomposes into pieces that run on different threads.

Background and History

In 1958, computer scientists John Cocke and Daniel Slotnick of IBM described one of the first uses of parallel computing in a memo about numerical analysis. Many early computer systems supported parallel computing—the IBM MVS series (1964–1995), which used threadlike tasks; the GE Multics system (1969), a symmetric multiprocessor; the ILLIAC IV (1964–1985), the most famous array processor; and the Control Data Corporation (CDC) 7600 (1971–1983), a supercomputer that used several types of parallelism.

American computer engineer Seymour Cray left CDC and founded Cray Research in 1972. The first Cray 1 was delivered in 1977, and Cray's dominance of the supercomputer industry began in the 1980s. Cray used pipelined processing to increase the flops of a single processor, a technique also used in many Reduced Instruction Set Computing (RISC), such as the i860 used in the Intel Hypercube. Cray developed several MIMD computers connected by a high-speed bus. Still, these were not as successful as the MIMD Intel Hypercube series (1984–2005) and the Thinking Machines Corporation SIMD Connection Machine (1986–1994). In 2004, multi-core processors were introduced as the latest way to do parallel computing, running a different thread on each core, and by 2011, many supercomputers were based on multi-core processors.

Since 1958, many companies and parallel processing architectures have come and gone. The most popular parallel computer in the twenty-first century consists of multiple nodes connected by a high-speed bus or network. Each node contains many processors connected by shared memory or a high-speed bus or network, and each processor is either pipelined or multi-core.

How It Works

Since emerging in 1958, parallel processing technology has taken some very sharp turns, resulting in several distinct technologies—early supercomputers that built super central processing units (CPUs), SIMD supercomputers that used many processors, and modern hybrid parallel computers that distribute processing at several levels. These technologies remain active, so a full explanation of parallel computing must include several rather different technologies.

Pipelined and Super Processors. Early computers had a single CPU, so it was natural to improve the CPU to increase speed. The earliest computers had CPUs that provided control and did integer arithmetic. One of the first improvements to the CPU was to add floating point arithmetic to the CPU (or an attached coprocessor). In the 1970s, several people, most notably Seymour Cray, developed pipelined processors that could process arrays of floating point numbers in the CPU by having CPU processor components, such as part of the floating point multiplier, operate on numbers in parallel with the other CPU components. The Cray 1, X-MP, and Y-MP were the leaders in supercomputers for the next few years. Cray and others considered using gallium arsenide rather than silicon for the next speed improvement for the CPU, but this technology never worked. Several companies attempted to increase the clock speed (the number of instructions per second executed by the CPU) using other techniques, such as pipelining, which worked reasonably well until the 2000s.

Several companies were able to build a CPU chip that supported pipelining, with Intel's i860 being one of the first. While chip density increased as predicted by Moore's law (density doubles every two years), signal speed on the chip limited the size of the pipelined CPU that could be put on a chip. In 2005, Intel introduced its first multi-core processor that had multiple CPUs on a chip. Applications software was then developed to decompose a program into components that could execute a different thread on each processor, thus achieving a new type of single CPU parallelism.

Another idea has been developed for doing parallel processing at the processor level. Some supercomputers are being built with multiple graphics processing units (GPUs) on a circuit board, and some have been built with a mix of CPUs and GPUs on a board. A variety of techniques is being used to combine these processors into nodes and computers, but they are similar to the existing hybrid parallel computers.

Data Transfer. Increasing processor speed is an important part of supercomputing, but increasing the speed of data transfers between the various components of a supercomputer is just as important. A processor is housed on a board that also contains local memory and a network or bus connection module. Sometimes, a board houses several processors communicating via a bus, shared memory, or a board-level network. Multiple boards are combined to create a node. In the node, processors exchange data via a (backplane) bus or network. Nodes generally exchange data using a network, and if multiple computers are involved in a parallel computing program, the computers exchange data via a transmission control protocol/Internet protocol (TCP/IP) network.

Flynn's Taxonomy. In 1972, American computer scientist Michael J. Flynn described a classification of parallel computers that has proved useful. He divided the instruction types into single instruction (SI) and multiple instruction (MI), and he divided the data types into single data (SD) and multiple data (MD). This led to four computer types—SISD, SIMD, MISD, and MIMD. SISD is an ordinary computer, and MISD can be viewed as a pipelined processor. The other classifications described architectures that were extremely popular in the 1980s and 1990s and are still in use. SIMD computers are generally applied to special problems, such as numeric solutions of partial differential equations, and can perform very well in these problems. Examples of SIMD computers include the ILLIAC IV of the University of Illinois (1974), the Connection Machine (1980s), and several supercomputers from China. MIMD computers are the most general type of supercomputer, and the new hybrid parallel processors can be seen as a generalization of these computers. While there have been many successful MIMD computers, the Intel Hypercube (1985) popularized the architecture, and some are still in use.

Software. Most supercomputers use some form of Linux as their operating system, and the most popular languages for developing applications are FORTRAN and C++. Support for parallel processing at the operating-system level is provided by operating-system directives, which are special commands to the operating system to do something, such as using the maximum number of threads within a code unit. At the program level, one can use blocking/unblocking message passing, access a thread library, or use a shared memory section with the OpenMP API (application program interface).

Applications and Products

Weather Prediction and Climate Change. One of the first successful uses of parallel computing was in predicting weather. Information, like temperature, humidity, and rainfall, has been collected and used to predict the weather for over 500 years. Many early computers were used to process weather data. As the first supercomputers were deployed in the 1970s, some of them were used to provide faster and more accurate weather forecasts. In 1904, Norwegian physicist and meteorologist Vilhelm Bjerknes proposed a differential-equation model for weather forecasting that included seven variables, including temperature, rainfall, and humidity. Many have added to this initial model since its introduction, producing complex weather models with many variables that are ideal for supercomputers, and this has led to many government agencies involved in weather prediction using supercomputers. The European Centre for Medium-Range Weather Forecasts (ECMWF) was using a CDC 6600 by 1976. The National Center for Atmospheric Research (NCAR) used an early model by Cray. In 2009, the National Oceanic and Atmospheric Administration (NOAA) announced the purchase of two IBM 575s to run complex weather models, which improved forecasting of hurricanes and tornadoes. In the 2020s, the NOAA upgraded their technology, implementing Cactus and Dogwood, two HPE Cray EX supercomputers powered by AMD EPYC CPUs. Supercomputer modeling of the weather has also been used to determine previous weather events, such as the worldwide temperature of the Earth during the age of the dinosaurs, as well as to predict future phenomena, such as global warming. In the prediction of everyday weather, the prediction app Tomorrow.io (formerly ClimaCell) relies on parallel computing to document and update millions of data points every two minuites to give thier users the most reliable weather prediction information possible. Graphics processing units power hundreds of core processors that condense, clean, and display data.

Efficient Wind Turbines. Mathematical models describing airflow characteristics over a surface, such as an airplane wing, using partial differential equations (Claude-Louis Navier, George Gabriel Stokes, Leonhard Euler, and others) have existed for over a hundred years. The solution of these equations for a variable, such as the lift applied to an airplane wing for a given airflow, has used computers since their invention. When supercomputers appeared in the 1970s, they were used to solve these problems. There also has been great interest in using wind turbines to generate electricity. Researchers are interested in designing the best blade to generate thrust without loss due to vortices rather than developing a wing with the maximum lift. The EOLOS Wind Energy Research Consortium at the University of Minnesota used the Minnesota Supercomputer Institute's Itasca supercomputer to perform simulations of airflow over a turbine (three blades and their containing device) and, as a result of these simulations, was able to develop more efficient turbines. Additionally, multi-thread parallel equivalent computing has been offered as a solution in large-scale offshore wind farms to overcome the constraints presented by the large number of electrical nodes.

Multispectral Image Analysis. Satellite images consist of large amounts of data. For example, Landsat 7 images consist of seven tables of data, where each entry in a table represents a different magnetic wavelength (blue, green, red, or thermal-infrared) for a 30-meter-square pixel of the Earth's surface. A popular use of Landsat data is determining the particular pixels represented, such as a submerged submarine. One approach to classify Landsat pixels is to build a backpropagation neural network and train it to recognize pixels (determining the difference between water over the submarine and water that is not). Several neural networks have been implemented on supercomputers over the years to identify multispectral images.

Biological Modeling. Many applications of supercomputers involve the modeling of biological processes on a supercomputer. Both continuous modeling, involving the solution of differential equations, and discrete modeling, finding selected values of a large set of values, have used supercomputers. Dr. Dan Siegal-Gaskins of the Ohio State University built a continuous model of cress cell growth, consisting of seven differential equations and twelve unknown factors, to study why some cress cells divide into trichomes. This cell assists in growth, as opposed to an ordinary cress cell. The model was run on the Ohio Supercomputer Center's IBM 1350 Cluster, which has 9,500 core CPUs and a peak computational capability of 75 teraflops. After running many models and comparing the results of the models to the literature, Siegal-Gaskins decided that three proteins were actively involved in determining whether a cell was divided into a trichome or ordinary cells. Many examples of discrete modeling in biology using supercomputers are also available. For example, because of their computational requirements, many DNA and protein recognition programs can only be run on supercomputers.

Astronomy. Supercomputers have many applications in astronomy, including simulating past or future events to test astronomical theories. The University of Minnesota Supercomputer Center simulated what a supernova explosion originating on the edge of a giant interstellar molecular gas cloud would look like 650 years after the explosion. NASA's Ames Research Center’s supercomputer, Pleiades, is among the most powerful in the world. Pleiades uses several Intel Xeon processors—E5-2680v4, E5-2680v3, E5-2680v2, and E5-2670—to analyze, search, and form replicas of various complex celestial objects.

Digital Multimedia. The information age has seen a meteoric rise in multimedia consumption and streaming services, leading to an ever-growing demand for faster and more efficient multimedia content processing and delivery. Digital multimedia can have various attributes that need individual processing—text, video, or audio. There is a demand for high-fidelity audio and high-resolution video, and the traditional single-thread computing algorithms cannot keep up. Content providers like Netflix, Amazon, and Google have large content delivery networks (CDNs) comprising stacks of specialized digital signal processors (DSPs). Operating in parallel, these DSPs can encode and decode a Blu-ray video with 5.1 surround audio within seconds.

Careers and Course Work

A major in computer science, engineering, mathematics, or physics is most often the course selected to prepare for a career in parallel computing. It is advisable for those going into parallel processing to have a strong minor in an application field such as biology, medicine, or meteorology. One needs substantial coursework in mathematics and computer science, especially scientific programming, for a career in parallel computing. Some positions are available for those interested in extending operating systems and building packages to be used by application programmers. For these positions, one generally needs a master's or doctoral degree. Most universities teach courses in computer science and mathematics. Those parts of parallel computing related to the construction of devices are generally taught in computer science, electrical engineering, or computer engineering. Those involved in developing systems and application software are usually taught in computer science and mathematics. Taking several courses in programming and mathematics is advisable for anyone seeking a career in parallel computing. Those seeking a career in the hardware aspect of parallel computing must take classes on hardware descriptive languages like Verilog and System C. These are generally taught in specialized engineering courses such as electrical and VLSI engineering.

Those seeking careers in parallel computing take a wide variety of positions. A few go to work for companies that develop hardware, such as Intel, AMD, and Cray. Others become parallel computing software engineers and work for companies that build system software, such as Cray or IBM, or develop parallel computing applications for a wide range of organizations in engineering, aviation, or government agencies.

Social Context and Future Prospects

Parallel processing can solve some of society's most complex problems, such as determining when and where a hurricane will hit, thus improving people's standard of living. An interesting phenomenon is the rapid development of supercomputers for parallel computing in Europe and Asia. While this will result in more competition for the US supercomputer companies, it should also result in a wider use of supercomputers and improve the worldwide standard of living.

Supercomputers have always provided technology for tomorrow's computers, and technology developed for today's supercomputers will offer faster processors, system buses, and memory for home and office computers. Parallel computing supports graphics in video games, cryptocurrency, online banking, cell phones, and many other everyday products. Considering the growth of the supercomputer industry at the beginning of the twenty-first century and the state of computer technology, the supercomputer industry, innovations in parallel computing, and research advancing their applications are only increasing. As quantum computers become a reality, the power and variety of supercomputers only increases.

Further Reading

Culler, David, Jaswinder Pal Singh, and Anoot Gupta. Parallel Computer Architecture: A Hardware/Software Approach. San Francisco: Kaufmann, 1999.

Hwu, Wen-mei W., et al. Programming Massively Parallel Processors: A Hands-on Approach. 4th ed., Morgan Kaufmann, 2023.

Kirk, David, and Wen-mei Hwu. Programming Massively Parallel Processors: A Hands-On Approach. 3rd ed., Kaufmann, 2016.

Kim, Seungchul, et al. "Rapid Parallel Transcoding Scheme for Providing Multiple-Format of a Single Multimedia." Advanced Multimedia and Ubiquitous Engineering, edited by James J. Park et al., vol. 518, Springer Singapore, 2019, pp. 855–61. Accessed 9 June 2021.

Krikelis, A. "Parallel Multimedia Computing." Advances in Parallel Computing, vol. 12, 1998, pp. 45–59, doi.org/10.1016/S0927-5452(98)80006-3. Accessed 8 June 2021.

Kesavan, Suraj P., et al. “Scalable Comparative Visualization of Ensembles of Call Graphs.” IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 3, 2023. doi.org/10.1109/TVCG.2021.3129414.

"Parallel Computing Definition." Heavy AI, www.heavy.ai/technical-glossary/parallel-computing. Accessed 20 May 2024.

Rauber, Thomas, and Gudula Rünger. Parallel Programming: For Multicore and Cluster Systems. 3rd ed., Springer, 2023.

Zou, Ming, et al. "Modeling for Large-Scale Offshore Wind Farm Using Multi-Thread Parallel Computing." International Journal of Electrical Power & Energy Systems, vol. 148, 2023. doi.org/10.1016/j.ijepes.2022.108928.