Neural Networks
Neural networks are a computing architecture inspired by the structure and function of biological brains. There are two primary types of neural networks: biological, which exist in living organisms, and artificial, which are designed to mimic their biological counterparts. Both types operate differently from conventional computers, utilizing massively parallel processing with simple processing units—neurons—that are highly interconnected. This parallel distributed processing enables neural networks to efficiently tackle tasks, particularly in pattern recognition and classification, which traditional sequential computers struggle with, especially in noisy environments.
Artificial neural networks have the ability to learn and adjust their responses based on inputs, effectively allowing them to “program” themselves through various training methods. They demonstrate remarkable capabilities, such as recognizing patterns amidst distortions, which aligns with the functionality of biological networks. Applications for neural networks are diverse and include areas like speech recognition, financial analysis, and image processing, where they excel in tasks that require pattern mapping and completion. The evolution of neural network technology, particularly with the advent of deep learning, has significantly impacted fields like artificial intelligence, enabling advancements in machine learning and automation.
Neural Networks
Neural networks form a computing architecture that is very different from that of conventional sequential computers. They can be used to understand better the functioning of biological brains and to build machines that can reproduce many of the amazing abilities of biological brains.
Overview
There are two major types of neural networks: biological and artificial. Both types of neural networks form a computing architecture that is very different from that of the conventional digital computer: They are massively parallel, use very simple processing units, and have a vast array of interconnections among these units.
Artificial neural networks represent an attempt to mimic the neural networks that exist in biological brains. The attempt takes a double approach: It is an effort aimed at a better understanding of the functioning of biological brains, and it is also an effort aimed at building machines that can duplicate many of the amazing abilities of biological brains.
Although the individual processing units of a biological brain operate much more slowly than the processing units in a computer, the biological brain is able to break down many tasks into a number of small steps that are then processed simultaneously, or very nearly so, by a large number of identical processing units distributed throughout the brain. This approach to problem solving is often referred to as "parallel distributed processing." It can result in very dramatic increases in problem solving speed compared to the speed obtained using traditional computer algorithms (problem-solving procedures), in which one processing unit goes through a long series of problem-solving steps in a sequential manner.
Artificial neural networks can modify their behavior as a result of environmental conditions (inputs). This learning capacity is one of their most interesting capabilities. They accomplish this by adjusting themselves until their responses are consistent. Many training procedures exist, and the selection of any one of them is often determined by the tasks envisioned for the network. The ability of artificial neural networks to recognize patterns in the presence of distortions or noise is similar to the capabilities of biological networks and represents a major improvement over the conventional computer. The structure of the network, rather than a computer program that has been produced by a human, is responsible for this recognition ability.
Much of the terminology of neuroscience has been carried over to artificial neural networks because of the obvious parallels in these two fields. A neural network, strictly speaking, is composed of a complex interconnected web of individual, biological "neurons" (cells of nervous tissue). Each neuron, although separated from the others, may form extremely close associations with other neurons for the purpose of signal transmission from one neuron to another. These communicating structures are called "synapses."
A typical neuron has a somewhat enlarged "cell body" from which extends a long, slender, generally branching extension called the "axon." The axon forms the output portion of the neuron and carries the neuron's signals (called "action potentials," "impulses," or "spikes") away to other neurons (in biological networks, the axon of one neuron may terminate on several thousand other neurons). Also extending from the cell body are (usually) shorter branched structures known as "dendrites." The treelike dendritic part of the neuron functions primarily as the receiving end that accepts arriving signals from the axons of other neurons. The typical neuron in the human brain has about 5,000 axon terminals impinging on it, and some have as many as 200,000 axon terminals.
It is neither the very simple processing units (neurons) nor their large numbers that allow neural networks to attain their performance. Instead, it is the tremendous number of synapses (interconnections) that give them their computing capabilities. The adult human brain is estimated to contain about 100 billion neurons and as many as 1,000 trillion synapses.
When an action potential arrives at the synaptic end of an axon, it triggers the release of a number of molecules of a substance (specific to that particular axon) called a "neurotransmitter." The molecules of a neurotransmitter then spread across the narrow gap of the synapse and quickly contact the target cell. One of only two possible types of effects is caused in the target cell by the neurotransmitter: There is either an increase or a decrease in the probability that the target cell itself will generate an action potential. When the probability of impulse generation increases, the synapse and its effect are referred to as excitatory, but when the probability decreases, the synapse and its effect are said to be inhibitory.
Not all synapses onto a neuron are equally effective in changing the probability that the neuron will generate an impulse. The reasons for this are quite complicated and result from, at least, the location of a synapse on the neuron. This differential effectivity among synapses has given rise to the concept of synaptic "weight" or "strength," in which a greater evoked change in the probability that a neuron will generate an impulse is assigned a greater weight or strength with respect to the other synapses. Therefore, it is not simply the absolute number of excitatory versus inhibitory synapses that happen to be active at any time that determines the behavior of a target neuron; the various synaptic strengths also play a role. When the sum of all the arriving signals results in a net excitation of the receiving cell that exceeds that cell's "threshold" of excitation, the receiving cell generates an action potential of its own and sends it down the axon to other neurons.
Artificial neural networks are naturally less complex in structure than their biological counterparts. Nevertheless, the artificial networks resemble the brain closely enough to allow their use as practical models in studies of information processing by the brain. The actual circuitry, however, or pattern of interconnections among the neurons, need not be based on any known biological brain.
The artificial neuron receives a set of inputs, each of which represents the synaptic input either from another artificial neuron in the network or from an input device of some type (for example, a sensor or transducer). Each signal that arrives from an input line is multiplied by an assigned synaptic weight (representing the strength of that particular synaptic connection).
The resulting product (arriving signal times synaptic weight) is called the input signal. All the input signals are added, which produces a net change in the activation level of the receiving neuron. If this activation level exceeds the neuron's assigned threshold level, the neuron generates an output signal of its own and sends it to all neurons to which its axon is connected.
The structure upon which a set of axons terminate, and which performs the summing operation of the input signals to compare with the threshold value, is the artificial neuron's approximation of a biological neuron's cell body and dendrites.
The form of the output signal from an artificial neuron depends on the specific neural models that are used. Many models generate an output signal that takes as a value either plus one or zero, while in other models the value can be either plus one or minus one.
All artificial neural networks have a number of simplifying characteristics that make them easier to design or to use. For example, many do not permit their component neurons to send signals except at fixed, specified time intervals. This is quite unlike biological networks, in which at any moment there may be some neurons active. Therefore, such artificial networks are unable to utilize the range of temporal sequencing of signals observed in the behavior of biological networks.
Another typical difference is the way in which input signals immediately generate outputs in artificial networks. Biological systems display a range of latent periods following an input before a response is produced.
Applications
Neural networks can be used in a wide variety of applications. They are already being used in many commercial applications, and new uses are being researched in laboratories throughout the world. In cases in which neural networks offer clear advantages over traditional sequential computers, there are various network architectures available from vendors.
Neural networks are most appropriately applied to problems demanding pattern recognition, pattern completion, or pattern mapping. The complexity of developing traditional sequential computer programs to accomplish these tasks is negated by the learning process that neural networks employ: They can be thought of as programming themselves, although this is not absolutely correct, since the networks do not use programs.
Other areas that are appropriate for the use of neural networks include applications that must deal with noisy data: Traditional computers are very sensitive to noisy data, while neural networks are, by comparison, quite insensitive. Such situations are common in speech recognition systems in which background noise often cannot be eliminated, in image processing and analysis, and in sonar signal classification.
One of the first commercial applications of artificial neural networks was in the identification of handwritten numerals and letters, such as those entered on bills to indicate the amount being paid or printed on envelopes to indicate the address. The challenge is to be able to identify correctly the characters in different people's handwriting. The variations in handwritten characters among writers are numerous enough to present great difficulties to traditional computer program developers. Neural networks, however, are first trained on a sample set of characters for which they are given the correct responses. After the training period has been completed, the ability of neural networks to generalize and to categorize enables the correct identification of handwritten characters from any writer.
In the biochemical field, pattern classification can become a monumentally complex task. There are tens of thousands of organic compounds, each of which has a set of biochemical characteristics (the pattern), some of which are unique to each substance but many of which are shared with other substances. Neural networks are used to automate the pattern classification process, thereby eliminating the human drudgery associated with this tedious procedure.
Artificial neural networks are also used to obtain complicated financial analyses. In particular, the area of financial forecasting requires the analysis of numerous parameters and the recognition of particular patterns of these parameters. The capacity to form associative memory systems with artificial neural networks has been of great value in this application. Here also, artificial neural networks are able to use their learning and pattern recognition skills to achieve accurate and useful results.
Automatic speech recognition is extremely difficult to program for computers. As early as 1930, the Hungarian Tihamer Nemes filed a patent application for a system that would automatically transcribe speech signals. Such a system would have formed the basis of a typewriter that could transcribe dictations. The patent application was denied on the basis of being "unrealistic." The basic problem of automatic speech recognition systems, however, was not forgotten by researchers. The uses for such a system range from criminology to airline ticket information, from reservations made by telephone to automatic on-line language translation--at international political or scientific meetings, for example.
The basic problem has to do with pattern recognition. Human speech recognition is an amazingly complex routine. It encompasses the detection of individual phonemes (phonetic units such as "k" and "b" sounds) from the speech waves and continues up the scale of complexity to the very advanced level of understanding ambiguous words based on the message context.
Probably the greatest difficulty has been the development of a speaker-independent system. This is easy to understand, since even the same word spoken by a single speaker on different occasions can have quite different sound frequency and amplitude variations (or sound spectra) depending on the emotional state of the speaker, the quality of the transmission line carrying the speech signal, and other factors. It is also true that different phonemes are not spectrally unique: The spectral characteristics of different phonemes overlap to various degrees.
The same phoneme spoken by two persons can be confused as a result of these confounding factors.
To achieve the resolution of word ambiguities (such as for the spoken words "poor" and "pour") by means of word context requires that the system be able to interpret the semantic content of the signal. This demands that higher thinking processes be incorporated into the system. Following decades of intensive research, the best commercial system that had been developed by the middle of the 1980's for the task of speaker-independent automatic speech recognition was restricted to isolated words (noncontinuous speech) from a vocabulary of about forty words.
Since human brains are able to perform speaker-independent continuous speech recognition, much research is directed toward the development of artificial neural networks to perform this task.
Weather forecasting provides an interesting example of how a network designer would proceed in the design and implementation of artificial neural networks. This type of assignment is referred to as "pattern mapping." The network will receive a variety of inputs (such as the amount of precipitation during the previous time period, the average wind velocity and direction at various altitudes, the barometric pressure, and temperature values from numerous altitudes and participating weather centers) and will produce an assortment of outputs (such as forecasts of precipitation probability and amount, wind velocity and direction, temperature extremes, and so forth). The designer decides on the exact representation that each of these inputs will have.
The heart of the network, its architecture, is also at the designer's discretion. The design choices include such factors as the number of layers of neurons in the network, the number of neurons in each layer, the patterns of interconnections between layers, and the types of learning and recall rules to be used during the training and use of the network.
Once the network has been constructed, the training period begins. Available weather history data are presented to the network as inputs, and the learning rules adjust the synaptic weights of the interconnections. During training, a given day's weather data form the input, and the output of the network (the next day's forecast) can be compared to the actual following day's weather data. Discrepancies between the network's forecast and the actual resulting weather data are used by the learning rules to adjust the synaptic weights. This comparison of forecasting with actual observed weather data also allows for human evaluation of the network following the training to determine whether the performance is sufficient, whether additional training is required, or whether design changes may be necessary. Should the network's performance be judged adequate, it is switched into recall mode. Now it uses today's weather data as input and produces an output that is the forecast of tomorrow's weather.
Context
The earliest existing medical manuscripts that discuss the nervous system (written by the Roman physician Galen in the second century AD) show that humans have always wanted to know how the brain functions. Since Luigi Galvani's demonstration in 1791 that electricity generated in the body caused nerves to induce muscle contractions, attention has been focused on the relationships between electrical forces and life.
During the second half of the nineteenth century, the German physiologist Emil Heinrich Du Bois-Reymond discovered that neuron signals are discrete electrical pulses.
Previously, it had been thought that electricity flowed in a continuous current through the nervous system. As instruments for measuring electrical phenomena were improved in precision and sensitivity, it became possible to learn more about the electrical activities of the brain.
Equally important was progress in understanding the brain's architecture. Before the end of the nineteenth century, it was not known whether the nervous system was constructed of a network of individual cells or was a continuous network. Probably the greatest discovery in this area was made by the Spanish histologist Santiago Ramon y Cajal when, near the beginning of the twentieth century, he found that the nervous system was a network of individual neurons that were separated from one another by minute gaps known as synapses.
Using the accumulating knowledge, researchers built simplified versions of the brain's circuitry and processing units (neurons), hoping that the functional capabilities of the brain could be re-created in these machines. The 1940s witnessed the initial realizations of these attempts. In 1957, the first standard network architecture was developed by Frank Rosenblatt: the perceptron.
Rosenblatt was interested in understanding how the brain could show learning, memory, and cognition despite the fact that its individual elements (neurons) had never been observed to possess any psychological functions.
The rapid growth in conventional digital computer technology, however, combined with a major shift in research funding away from artificial neural networks in the late 1960s, resulted in a slowing of neural network development. The 1980s witnessed a revival of interest in artificial neural networks, largely because of the research of Stephen Grossberg and John Hopfield, who greatly improved the mathematical and practical aspects of neural network theory.
After another lull in research in the early 2000s, interest in neural networks was reinvigorated in the 2010s. Great improvements in computer processing power enabled major leaps in development, particularly in the field of artificial intelligence (AI), even as neural network theory increasingly shifted away from biology-based models. Particularly notable was the rise of "deep neural networks," which use multiple layers of processing for machine learning. So-called deep learning underpinned the AI revolution of the early 2020s, epitomized by powerful, headline-grabbing AI chatbots such as ChatGPT.
Principal terms
ACTION POTENTIAL: the wave of electrochemical activity that propagates down an axon, away from the cell body, to the axon terminals
AXON: an extension of a neuron's cell membrane that conducts nerve impulses
DENDRITES: branching extensions of a neuron's cell body that function as a receiving area for incoming signals from other neurons
NEURON: a living nerve cell
NEUROTRANSMITTER: chemical substance released at a synapse by an axon terminal when an action potential arrives
SYNAPSE: a functional connection between neurons that is formed by the near contact of their membranes and is used for signal transmission from one neuron to another
Bibliography
Chen, James. "What Is a Neural Network?" Investopedia, 28 July 2024, www.investopedia.com/terms/n/neuralnetwork.asp. Accessed 5 Dec. 2024.
Dayhoff, Judith. Neural Network Architectures: An Introduction. Van Nostrand Reinhold, 1990.
Dhaliwal, Ranjodh Singh, et al. Neural Networks. U of Minnesota P, 2024.
Durbin, Richard, Christopher Miall, and Graeme Mitchison. The Computing Neuron. Addison-Wesley, 1989.
Hardesty, Larry. "Explained: Neural Networks." MIT News, 14 Apr. 2017, news.mit.edu/2017/explained-neural-networks-deep-learning-0414. Accessed 5 Dec. 2024.
Johnson, R. Colin, and Chappell Brown. Cognizers: Neural Networks and Machines That Think. John Wiley & Sons, 1988.
Khanna, Tarun. Foundations of Neural Networks. Addison-Wesley, 1990.
Wasserman, Philip D. Neural Computing: Theory and Practice. Van Nostrand Reinhold, 1989.