Parallel Computer Architecture
Parallel computer architecture refers to systems composed of multiple interconnected processors, ranging from a few to hundreds of thousands, which enable simultaneous computations. This architecture is particularly effective for tackling complex problems that require significant computational power, such as those found in scientific simulations, financial modeling, and fluid dynamics. Unlike traditional von Neumann architecture, which processes instructions sequentially using a single CPU, parallel architectures can execute multiple instructions and access multiple data streams at the same time, significantly enhancing processing speed.
Parallel architectures are generally classified into two main categories: multiple-instruction, multiple-data (MIMD) and single-instruction, multiple-data (SIMD). MIMD systems allow for diverse processing tasks across different processors, while SIMD systems operate in lockstep on the same tasks. Additionally, vector processors, often grouped with SIMD, handle data arrays efficiently but process elements sequentially. The evolution of multi-core processors has made parallel computing a standard feature in modern computers, reflecting the growing demand for more powerful computing capabilities as single-core speeds reach their limits.
The development and application of parallel computer architectures have become increasingly vital, especially with the rise of supercomputers, which are now used for sophisticated modeling and simulation tasks across various fields. As computational demands continue to expand, parallel architectures remain essential to advancing technology and solving complex real-world problems.
Parallel Computer Architecture
Type of physical science: Computation
Field of study: Computers
A computer system that consists of a number (from two to hundreds of thousands) of interconnected processors is considered to have a parallel computer architecture. These computers provide a cost-effective and efficient way of solving problems that involve a massive number of computations. They are commonly used for various scientific and industrial applications, including such areas as neural-network simulation, seismic analysis, financial modeling, and fluid dynamics.


Overview
Parallel computer architecture consists of anywhere from two to hundreds of thousands of interconnected computer processors. A computer system based on parallel computer architecture can do many more computations than one based on a single processor. Many problems in the physical sciences, such as weather forecasting, require numerous computations in order to solve.
A standard computer is based on the von Neumann architecture. A von Neumann computer has one central processing unit (CPU) that consists of a combined control unit and arithmetic-logic unit (ALU), an input unit, an output unit, and memory. Instructions and data are stored in the memory; the control unit selects instructions to be executed in the ALU, and data are provided to the ALU as needed. The input unit allows the user to provide data and instructions in a readable format, and the output unit produces all results in a readable format. The key difference between a von Neumann computer and a parallel computer is that in the former, each instruction is executed in sequential order and the data are accessed in a sequential fashion.
A useful classification of computer architectures is known as Flynn's taxonomy. This system, devised by Michael J. Flynn in 1966, is based on how a computer system executes instructions and accesses data. If parallel instruction execution is allowed, the architecture is said to be multiple instruction; otherwise, it is single instruction. If parallel access to the data is allowed, the architecture is called multiple data; otherwise, it is called single data. The von Neumann computer described above is a single-instruction, single-data (SISD) machine. To understand how an SISD machine operates, consider calculating the average of four hundred scores. The data would be the list of scores, and the ALU would be a simple hand calculator. To compute the average in SISD mode, one calculator would be used, and the scores would be keyed in one at a time. If each addition took eight seconds and the division by 400 took ten seconds, then it would take (400 x 8) + 10 = 3,210 seconds to compute the average. One can speed up an SISD computer by speeding up the individual addition and multiplication operations, but this is limited by electronic constraints. Early personal home computers were SISD, but more recent models use dual-core or multi-core processors that introduce a degree of parallelism.
Parallel computer architecture is a popular method of speeding up the process of computation. The multiple-instruction, multiple-data (MIMD) computer's method of speeding up the computation process can be explained in terms of the averaging example. Suppose that the four hundred numbers to be averaged were given to four people with calculators; in MIMD architecture, each calculator could be different. Each person would enter one hundred numbers (multiple data) and use the keystrokes for their calculator to compute the sum (multiple instruction). At the end, one person would add the four sums and divide by 400. If, again, it is assumed that each number can be entered in about eight seconds and that the division takes about ten seconds, then the total time required to do this computation in MIMD mode is (100 x 8) + (4 x 8) + 10 = 842 seconds—almost four times as fast as the SISD mode.
There are two main types of MIMD architecture: shared memory and distributed memory. Shared-memory MIMDs store the instructions and data in a memory that is shared by all processors. Each processor then goes to this shared memory for instructions and data. Alternatively, distributed-memory MIMDs provide some local memory to each processor and then require each processor to pass its results to other processors over a network as necessary.
Within these two categories are a large number of different setups. The main difference between them is the way in which the individual processors pass their data to the other processors. The most important interconnection networks for MIMD computers are the bus, the hypercube, and the central switch. If the four processors in the averaging example are arranged in a bus, then the sums will be passed sequentially to one person to compute the average. If they are arranged in a hypercube, then the sums will be passed twice to a nearest neighbor, with the last neighbor computing the average. If they are arranged in a central switch, then the sums will be passed directly to the designated averager.
The single-instruction, multiple-data (SIMD) computer speeds up the process of computation in a fashion that is similar to that of the MIMD computer. Going back to the averaging example, the main difference would be that each of the four people computing the sum of one hundred scores would use exactly the same type of calculator and would enter their numbers in exactly the same way. While this limits the types of problems that can be solved efficiently on SIMD computers, it allows the hardware to be implemented in a more efficient fashion. The most popular interconnected network for these computers is the mesh. Everything is done in lockstep in an SIMD computer, including the passing of information. One pass to the right followed by one pass down could be used to pass the four sums to the averager.
The third major type of parallel computer architecture is the vector processor, sometimes classified as an SIMD computer, although vector processors execute instructions on data arrays sequentially, rather than on individual pieces of data all at once. In terms of the averaging example, multiple people—four, for instance—would speed up the process of adding two binary numbers by having each person do one-fourth of the addition and then pass the work on to the next person. Assume that, as before, it takes eight seconds to compute the sum of two numbers; each of the partial summations would then take about two seconds. The first pair of numbers, which would be added after all four people did their part of the sum, would take eight seconds, but the second number would come out of the pipeline two seconds after the first. The last number of the set of 400 would take eight seconds (as did the first). In fact, the time spent adding would be (2 x 8) + (2 x 6) + (2 x 4) + (394 x 2) = 824 seconds. Assuming that it takes ten seconds for one person to do the division by 400, it would take a vector computer about 834 seconds to compute the average of four hundred numbers.
The early 2000s saw the development of the multi-core processor, which is a single integrated circuit that contains two or more CPUs, or cores. Multi-core processors allow computers to run certain programs faster, although the degree of improvement depends on the design of the program and how it is being used. Most if not all modern personal computers use multi-core processors, as the speed of single-core processors may have reached a physical limit.
Applications
Many of the developments in computer architecture have been brought about by the need to solve problems that are computationally intensive. In fact, this is the principal reason that John von Neumann and his contemporaries invented the stored-program SISD computer. The development of parallel computer architectures is closely related to computationally intensive applications.
The numerical solution of systems of linear equations, and the associated field of linear algebra, has always been crucial to the solution of computationally intensive problems. Some of the better-known techniques in this field include solving m equations in n unknowns, finding an eigenvector of a square matrix, computing the inverse of a square matrix, and doing matrix algebra for a large number of large dense matrices. Linear algebra becomes necessary when a physical problem is modeled and the resulting model contains a set of linear equations in some unknowns. Other areas of mathematics important to the solution of computationally intensive problems are the numerical solution of ordinary and partial differential equations, in which the model of the physical problem involves differential equations that need to be solved before one can fully understand the model; and graphics and image processing, in which a figure in three-dimensional space or an image in the plane is modified.
The popularity of parallel computer architecture increased vastly throughout the 2000s and 2010s, due largely to the fact that SISD computers and single-core processors have most likely reached the maximum speed and capability allowed by physics. The most advanced supercomputers are massively parallel, meaning they have a large number of processors; the Tianhe-2, developed by China's National University of Defense Technology and named the world's fastest supercomputer in 2013, features 3,120,000 cores and a top speed at completion of 33.86 petaflops (floating point operations per second).
The applications of supercomputers have become similarly ambitious since the 1970s and 1980s, when they were used for such tasks as analyzing weather patterns and modeling physical systems such as fluid flow. The Blue Brain Project at Switzerland's École Polytechnique Fédérale de Lausanne, started in 2005, used an IBM Blue Gene supercomputer with 16,000 processors to model a rat's neocortical column, which consists of ten thousand neurons making one hundred million connections. In 2011, Tianhe-1A, the predecessor of Tianhe-2, performed a molecular-dynamics simulation that modeled 110.1 billion atoms of crystalline silicon in only three hours.
Context
Computer science describes, analyzes, and organizes the various fields of computation. Like all sciences, it attempts to discover the general principles that underlie the large number of examples present in the real world. Computer architecture, the field of computer science that studies the ways in which computer systems are designed, considers both the hardware and the system software that combine to make a computer system. Parallel computer architecture is the subfield of computer architecture that studies computer systems with at least two processors.
The study of parallel computer architectures is as old as the study of computers. John von Neumann, who provided many of the ideas of a general computer in 1945, was also interested in a parallel computer. He saw the brain as a parallel computer and thought that parallel computer architectures were the most desirable, despite the difficulty of exercising parallel control of multiple processors. This control problem, and the nature of the hardware available in the 1940s, accounts for the fact that almost all early computers used the von Neumann architecture.
The combination of improved control of multiple processors and improved hardware with which to build them led to the design of several parallel computer architectures in the 1960s and 1970s, including the SIMD Illiac IV and the CRAY-1 vector processor. In the 1980s, continued improvements in the ability to control a number of processors, the ability to pass information between processors by means of an interconnection network, and the individual processors themselves made possible the development of a new generation of parallel processors. Of the vector processors, the most prominent were the CRAY XMP and YMP series. A famous MIMD computer of the 1980s was the Heterogeneous Element Processor (HEP); while it was not a commercial success, it did provide a model of what an MIMD computer could do. The best-known SIMD computer of this period was the Distributed Array Processor (DAP).
In the late 1980s, the chip technology that had been developed to support the microcomputer industry had a tremendous effect on parallel computer architectures. Intel developed the i860 chip, which was a vector processor on a chip. Many personal computers and workstations that used this chip as a CPU were as powerful as the CRAY-1 of the 1970s. Intel also developed its low-cost Hypercube series of computers. In 1992, the Touchstone Program, a research program supported by Intel and the Defense Advanced Research Projects Agency (DARPA), produced a computer with one hundred gigabytes of memory that computed at a rate of one hundred gigaflops (floating point operations per second).
Parallel computer architectures have produced computers that could solve problems that require many computations. As these computers became faster, scientists asked for even more computing power, and the industry responded by continuing to develop more powerful parallel computer architectures, while even personal computers advanced so far that any further improvements would have to incorporate at least some aspects of parallel computing.
Principal terms
ARITHMETIC-LOGIC UNIT (ALU): the part of a computer system that performs arithmetical and logical operations on data
COMPUTATIONALLY INTENSIVE: describes a problem that requires a large number of arithmetical operations (more than 1012) to solve
COMPUTER SYSTEM: a collection of software and hardware components that function as a system on which computer programs (instructions, data, and control statements) can be executed
CONTROL UNIT: the part of a computer system that controls the memory, arithmetic-logic, and input and output units by means of control statements
DATA: the lines of text that are used during the execution of a program to solve a problem
FLOATING POINT OPERATIONS PER SECOND (FLOPS): a standard measure of the speed of a computer system as the maximum number of real-number operations (calculations) that can be performed in one second
INSTRUCTION: a line of text that, when submitted to a computer, will cause it to take a fundamental unit of action
MEMORY: the part of a computer system that stores instructions, data, and control statements
PARALLEL COMPUTING: a form of computing in which actions are performed simultaneously by several processors
SEQUENTIAL: refers to actions that are performed one after another by a single processor
Bibliography
Aki, Selim. The Design and Analysis of Parallel Algorithms. Englewood Cliffs: Prentice, 1989. Print. This is a well-written text that covers the fundamental techniques used in developing programs for parallel computer architectures. Chapter 1 is an exceptionally good overview of the field.
Babb, Robert. Programming Parallel Processors. Reading: Addison, 1988. Print. This is a very good introduction to the SIMD and MIMD computers in use in 1978. The description of the individual computers closely follows the manufacturers' manuals and is quite readable.
Baer, Jean-Loup. Computer Systems Architecture. Rockville: Computer Sci., 1980. Print. This volume gives a real understanding of how parallel computers are designed.
Dubitzky, Werner, Krzysztof Kurowski, and Bernhard Schott. Large-Scale Computing Techniques for Complex System Simulations. Hoboken: Wiley, 2012. Print.
Fox, Geoffrey C., et al. Solving Problems on Concurrent Processors. Englewood Cliffs: Prentice, 1988. Print. This text contains a large number of examples of applications of hypercube architectures by those who first designed and built hypercubes.
Hwang, Kai. Tutorial, Supercomputers: Design and Application. Silver Spring: IEEE, 1984. Print. An excellent tutorial on all the parallel computer architectures. The text contains many of the original articles that introduced the architectures.
Hwang, Kai, and Doug Degroot, eds. Parallel Processing for Supercomputers and Artificial Intelligence. New York: McGraw, 1989. Print. This is an excellent collection of essays that describe many of today's parallel computers. Many of the essays also describe problems that have been solved on parallel computer architectures.
Padua, David, ed. Encyclopedia of Parallel Computing. 4 vols. New York: Springer, 2011. Print.
Sarbazi-Azad, Hamid, and Albert Y. Zomaya, eds. Large Scale Network-Centric Distributed Systems. Hoboken: Wiley, 2014. Print.
Zomaya, Albert Y., and Young Choon Lee, eds. Energy-Efficient Distributed Computing Systems. Hoboken: Wiley, 2012. Print.
Interconnection networks
Fluid Mechanics and Aerodynamics
Calculations of Molecular Structure
Numerical Solutions of Differential Equations