Historical perspectives on testing

  • TYPE OF PSYCHOLOGY: Intelligence and intelligence testing

SIGNIFICANCE: Researchers worldwide have influenced modern psychological testing standards. French researchers emphasized clinical observation. German researchers emphasized experimentation. British researchers were interested in individual differences. American researchers generally take a pragmatic approach.

Introduction

Tests have been used since ancient times. In China, around 2000 BCE, public officials were examined regularly and promoted or dismissed based on these examinations. The direct historical antecedents of contemporary testing date back to the mid-1800s and reflect contributions made by researchers across the globe representing four historical traditions: the French clinical tradition, the German scientific tradition, the British emphasis on individual differences, and the American practical orientation.

Tests are an intrinsic part of life. Children are tested to determine when they will enter school, which classes they may be placed in, and how much they will learn. Young adults take tests to determine whether they should receive a high school diploma, enter college, or participate in some specialized training. People are tested if they seek admission to law school or medical school if they want to practice a profession, and if they want to work for a specific company or show proficiency in a particular talent.

The French clinical tradition emphasized clinical observation. That is, the French were very interested in individuals with psychiatric and intellectual disabilities, and a number of French physicians wrote excellent descriptions of the patients they studied. They produced very perceptive and detailed descriptions or case studies, thereby contributing to the notion that the creation of a test must be preceded by careful observations of the real world. To develop a test to measure depression, for example, one must first carefully observe many patients with depression. The French also produced the first practical test of intelligence. Alfred Binet (1857-1911), a well-known French psychologist, devised the Binet-Simon test with Théodore Simon (1873-1961) in 1905 for use with French schoolchildren to identify those with intellectual disabilities that may require specialized instruction.

A second historical trend that affected testing was the scientific approach promulgated by German scientists in the late 1800s. Among the best-known German scientists of the era was Wilhelm Wundt (1832-1920), often called the founder of experimental psychology. He was particularly interested in reaction time—the rapidity with which a person responds to a stimulus. To study reaction time, Wundt and his students carried out systematic experimentation in a laboratory, mainly focused on sensory functions such as vision, and developed several instruments to study reaction time. Although Wundt was not interested in tests, his scientific approach and focus on sensory functions influenced later test developers, who saw testing as an experiment in which standardized instructions were followed and strict control over the testing procedure needed to be exercised. They even measured sensory processes such as vision to be an index of how well the brain functioned and, therefore, how intelligent the person was.

Researchers in Germany were interested in discovering general laws of behavior and investigating reaction time’s role in intellectual processes that presumably occur in the brain. The British were more interested in looking at individual differences. The British viewed these differences not as errors, as Wundt did, but as a fundamental reflection of evolution and natural selection, the ideas that had been given a strong impetus by the work of Charles Darwin. Darwin’s cousin, Sir Francis Galton (1822-1922), is said to have launched the testing movement on its course. Galton studied eminent British men and became convinced that intellectual genius was fixed by inheritance: One was born a genius rather than trained to be one. Galton developed several tests to measure various aspects of intellectual capacity, tested many individuals who visited his laboratory, and developed various statistical procedures to analyze the test results.

American Perspectives

Psychological testing became an active endeavor in the United States in the late 1800s. In 1890, psychologist James McKeen Cattell (1860-1944) wrote a scientific paper that, for the first time, used the phrase mental test. The paper presented a series of ten tests designed to measure a person’s intellectual level. These tests involved procedures such as the subject’s estimating a ten-second interval and measurement of the amount of pressure exerted by the subject’s grip. Cattell had been a pupil of Wundt, and these tests reflected Wundt’s heavy emphasis on sensory abilities. The tests were administered to Columbia University students, where Cattell was a professor, to see if the results predicted a grade point averagehey did not, but the practice of testing students to predict their college performance was born.

Lewis Terman, a professor at Stanford University, took the French test that Binet had developed and created a new version in English called the Stanford-Binet Test. Thus, intelligence testing became popular in America. When the United States entered World War I in 1917, the military needed a way to screen out recruits whose intellectual capabilities were too limited for military service. The military also needed to identify recruits who may excel in specialized training or be suited for officer training programs. Several tests were developed to meet these needs, and when the war was over, they became widely used in industry and schools. By World War II, testing had become quite sophisticated and widespread and was again given impetus by the need to make major decisions about military personnel in a rapid and efficient manner. Thus, not only intellectual functioning but also problems of adjustment, morale, and psychopathology all stimulated interest in testing.

As with any other field of endeavor, advances in testing were also accompanied by setbacks, disputes, and criticisms. In the late 1930s and early 1940s, for example, there was a rather acrimonious controversy between researchers at Iowa University and those at Stanford University over whether the intelligence quotients (IQs) of children could be increased through enriched school experiences. In the 1960s, tests were severely criticized, especially the multiple-choice items used in tests to make admission decisions in higher education. Many books were published that attacked testing, often in a distorted and emotional manner. In the 1970s, intelligence tests again came to the forefront in a bitter controversy about whether White people are more intelligent than Black people. Many school districts eliminated the administration of intelligence tests, both because the tests were seen as tools of potential discrimination and because of legal ramifications.

Tests are still criticized and misused, but they have become much more sophisticated and represent a useful set of tools that, when used appropriately, can help people make more informed decisions.

Testing Skills

In the everyday world, there are a number of decisions that must be made daily. For example, “Susan” owns a large manufacturing company and has openings for ten lathe operators. When she advertises these positions, 118 prospective employees apply. How will Susan decide which ten to hire? Clearly, she wants to hire the best of applicantsthose who will do good work, who will be responsible and come to work on time, who will follow the expected rules but also be flexible when the nature of the job changes, and so on. She would probably want to interview all the applicants, but it might be physically impossible for her to do so since it would require too much time, and perhaps she might realize that she does not have the skills to make such a decision. An alternative, then, would be to test all the applicants and to use the test information with other data, such as letters from prior employers, to make the needed decision. A test, then, can be looked at as an interview. It is typically more objective since the biases of the interviewer will be held in check, more time effective since a large number of individuals can be tested at one sitting, whereas interviews typically involve one candidate at a time, more economical, since a printed form will typically cost less than the salary of an interviewer, and, usually, more informative, since a person’s results can be compared with the results of others, whereas one’s performance in an interview is more difficult to evaluate.

Historically, most tests have been developed because of pressing practical needsthe need to identify students who might benefit from specialized instruction, the need to identify Army recruits with special talents or problems, or the need to identify high school students with a particular interest in a specific field such as physics. As testing grew, the applications of testing also expanded.

Modern tests are used to provide information about achievement, intellectual capacity, potential talents, career interests, motivation, and hundreds of other human psychological concerns. Tests are also developed to serve as tools for the assessment of social or psychological theories. For example, measures of depression are of interest to social scientists investigating suicide, while measures of social support are useful in studies of adolescents and older adults.

Testing Potential

Another way of thinking about tests is that a test represents an experiment. The experimenter, in this case usually a psychologist or someone trained in testing, administers a set of carefully specified procedures and just as carefully records the subject’s responses or performance on these procedures. Thus, a psychologist who administers an intelligence test to a schoolchild is interested not simply in computing the child’s IQ but also in observing how the child goes about solving new problems, how extensive the child’s vocabulary is, how the child reacts to frustration, the facility with which the child can solve word problems versus numerical problems, and so on. While such information could be derived by carefully observing the child in the classroom over a long period, using a specific test procedure not only less time-consuming but also allows for a more precise comparison between a particular child’s performance and that of other children.

There are, then, at least two ways, not mutually exclusive, of thinking about a test. Both of these ways of thinking are the result of various historical emphases. The French emphasized the clinical symptoms exhibited by the individual, while the German emphasized on the scientific procedure, the British interest in individual differences, and the American emphasized practicality“Does it work, and how fast can I get the results?”

Tests are only one source of information, and their use should be carefully guided by a variety of considerations. Psychologists who use tests with clients are governed by very detailed rules. One set of rules concerns the technical aspects of constructing a test, ensuring the test has been developed according to scientific guidelines. A second set of rules concerns ethical standards, ensuring that the information derived from a test is used carefully for the client's benefit without creating an adverse impact.

Because the use of tests does not occur in a vacuum but rather in a society that has specific values and expectations, that emphasizes or denies specific freedoms, and in which certain political points of view may be more or less popular, this use is often accompanied by strong feelings. For example, in the 1970s, Americans became very concerned about the deteriorating performance of high school seniors who were taking the Scholastic Aptitude Test (SAT, later the SAT Reasoning Test) for entrance into college. From 1963 to 1977, the average score on the SAT verbal portion declined by about 50 points, and the average score on the SAT mathematics section declined by about 30 points. Rather than seeing the SAT as simply a nationwide “interview” that might yield some possibly useful information about a student’s performance at a particular point in time, the SAT had become a goal in itself, a standard by which to judge all sorts of things, including whether high school teachers were doing their jobs.

Testing Psychology

Tests play a major role in most areas of psychology, and the history of psychological testing is intertwined with the history of psychology as a field. Psychology is defined as the science of behavior, and tests are crucial to the experimentation that is at the basis of that science. Especially with human subjects, studies are typically carried out by identifying some important dimension, such as intelligence, depression, concern about one’s health, or suicide ideation, and then trying to alter that dimension by some specific procedure, such as psychotherapy to decrease depression, education to increase health awareness, a medication designed to lessen hallucinations, and so on. Whether the specific procedure is effective is then assessed by the degree of change, typically measured by a test or a questionnaire.

Psychologists may practice in a variety of fields, including working with individuals with a mental health condition or cognitive impairment, individuals with substance use disorder, college students experiencing personal difficulties, spouses experiencing marital problems, business executives wanting to improve their leadership abilities, or high school students uncertain of what career to pursue. All these situations often involve the use of tests to identify the status of a person, to make predictions about future behavior, to identify achievement, or to identify strengths—in other words, to get a more objective and detailed portrait of the particular client.

The rapid development of technology in the twenty-first century impacted testing. Testing became possible on a computer that scored the assessment and provided instant client feedback, often with great detail. Computers also allowed tests to be better tailored to the individual with the advent of computerized adaptive testing (CAT). Suppose, for example, a test with one hundred items is designed to measure basic arithmetic knowledge in fifth-grade children. Traditionally, all one hundred items would be administered, and each child’s performance scored accordingly. Using CAT, however, a test can present only selected items, with subsequent items being present or absent depending on the child’s performance on the prior item. If, for example, a child can do division problems quite well, as shown by their correct answers to more difficult problems, the computer can be programmed to skip the easier division problems. Using CAT in testing has been shown to produce more accurate results by avoiding fatigue and promoting the test taker's confidence.

It is essential to use testing wisely and recognize tests' limitations. Many factors impact testing results, and the score of a single test should never be used as the sole basis for any decision.

Bibliography

Anastasi, Anne, and Susan Urbina. Psychological Testing. 7th ed. Prentice, 1997.

Ballard, Philip Boswood. Mental Tests. Hodder, 1920.

Cooper, Colin. An Introduction to Psychometrics and Psychological Assessment: Using, Interpreting and Developing Tests. 2nd ed. Routledge, 2023.

Flanagan, Dawn P., and Erin M. McDonough. Contemporary Intellectual Assessment. Guilford Press, 2022.

Garrett, Henry Edward, and Matthew R. Schneck. Psychological Tests, Methods, and Results. Harper, 1933.

Garrison, Mark J. A Measure of Failure: The Political Origins of Standardized Testing. State U of New York P, 2009.

Gregory, Robert. Psychological Testing: History, Principles, and Applications. 7th ed. Pearson, 2013.

"History of Standardized Testing in the United States." National Education Association, 25 June 2020, www.nea.org/professional-excellence/student-engagement/tools-tips/history-standardized-testing-united-states. Accessed 1 Oct. 2024.

McIntire, Sandra A., and Leslie A. Miller. Foundations of Psychological Testing: A Practical Approach. 6th ed. Sage, 2020.

Office of Strategic Services. Assessment of Men: Selection of Personnel for the Office of Strategic Services. Rinehart, 1948.

Sacks, Peter. Standardized Minds: The High Price of America’s Testing Culture and What We Can Do to Change It. Perseus, 2001.

Sokal, Michael M., ed. Psychological Testing and American Society, 1890-1930. Rutgers UP, 1987.

Stanovich, Keith E. What Intelligence Tests Miss: The Psychology of Rational Thought. Yale UP, 2009.

Wise, Paula Sachs. The Use of Assessment Techniques by Applied Psychologists. Wadsworth, 1989.