Ability tests
Ability tests are psychological assessments designed to measure various cognitive skills and intelligence levels. The development of these tests began in the late 19th century with the work of French psychologist Alfred Binet, who aimed to understand normal cognitive functioning rather than mental illness. Binet's pioneering tests, created alongside Théodore Simon, focused on children's problem-solving abilities and laid the foundation for modern intelligence testing. Notably, Binet introduced the concept of the intelligence quotient (IQ), which compares a person’s mental age to their chronological age.
Subsequent revisions, such as the Stanford-Binet test developed by Lewis Terman, expanded these assessments to include a broader age range and different types of cognitive abilities. Ability tests have since evolved, leading to various methods including group testing commonly used in educational and military contexts. Despite their widespread application, ability tests have faced criticism regarding cultural bias, as many are perceived to favor certain demographic groups, particularly those from middle-class backgrounds. Critics also argue that these tests often fail to capture the full spectrum of intelligence, which encompasses diverse abilities beyond those measured by traditional IQ tests. Ongoing debates continue about the role of genetics versus environment in shaping intelligence, highlighting the complexity of defining and measuring cognitive ability across different populations.
Ability tests
- DATE: Beginning in the 1890s
- TYPE OF PSYCHOLOGY: Learning
Introduction
Whatever may be, the first scientific attempts to measure it were conducted by French psychologist and physician Alfred Binet. From 1894 until his death, Binet was director of the psychology laboratory at the Sorbonne. Between 1905 and 1911, Binet and his colleague Théodore Simon devised a series of tests that became the basis for tests in many areas. The Stanford, Herring, and Kuhlmann tests are among the revisions to Binet and Simon’s tests. Binet, unlike many of his contemporaries in psychology, was interested in how normal minds work rather than in mental illness. It was his goal to discover inherent intelligence, apart from any educational influence.

![Korean College Scholastic Ability Test schedule.jpg. Weekly schedule of KCSAT, written on a high school whiteboard. By Salamander724 (Own work) [Public domain], via Wikimedia Commons 93871740-60150.jpg](https://imageserver.ebscohost.com/img/embimages/ers/sp/embedded/93871740-60150.jpg?ephost1=dGJyMNHX8kSepq84xNvgOLCmsE2epq5Srqa4SK6WxWXS)
Binet came to develop his tests through observation of his daughters. He was interested in how they solved problems that he set for them. Binet noted the existence of individual differences and the fact that not all thought processes use the same operational path. Binet argued that lack of ability in specific fields was not a mental illness. He also noted that there were different types of memory. This discovery led to his work with Simon on achievement levels for “normal” children.
Binet’s first test, administered to students in Paris in 1905, asked children to follow commands, copy patterns, name objects, and put things in order or arrange them properly. His standard was based on his data. Thus, if 70 percent of a certain age group succeeded on a given task, those who passed at that level were at that mental age level. Binet introduced the term intelligence quotient or IQ. IQ is the ratio of mental age to chronological age, with 100 being average. For example, an eight-year-old who succeeds on the ten-year-olds test would have an IQ of 10/8 100, or 125. Soon, there was widespread enthusiasm for testing and finding IQ scores. Several measures were introduced. The US Army used tests to sort out recruits in World War I. The tests assessed general knowledge rather than ability on specific tasks.
Binet’s tests required modifications. The first, and perhaps most famous, was the Stanford-Binet test, developed in 1916 by Lewis Terman. Various educational, government, and other agencies immediately put it to use. This test is mainly based on verbal ability and uses an IQ. Terman worked to overcome the limitations of the age-scale principle of testing. He wanted to measure the full range of intelligence. Binet's scales have two major shortcomings in measuring adult intelligence. First, an older person’s score became meaningless when divided by their chronological age. Terman assigned the chronological age of fifteen to everyone over sixteen. Another major defect in Binet’s scales was the absence of test items to test and measure high intelligence. Terman added such items, assigning them mental age levels up to twenty-two. This enabled him to measure the IQs of older children and young adults.
There were additional revisions to the Stanford-Binet test. In 1937, for example, Terman and Maude Merrill published a revision of the test based on the same principles as the 1916 examination. However, they improved the selection of items and method of . Merrill published another revision in 1959. These revisions have found wide acceptance, also serving as models for other individual IQ tests and as a means for checking their scales.
The Wechsler scale, introduced in 1939, includes both verbal and performance measures. These scores compare an individual’s intelligence with those of others of the same age to yield an IQ score. The Wechsler-Bellevue adult scale uses a derived IQ to measure the intelligence of people between the ages of seven and seventy, comparing each person’s scores with standards for their age group. Wechsler produced two other scales: the Wechsler Intelligence Scale for Children, published in 1949, designed for children aged five to fifteen, and the Wechsler Adult Intelligence Scale, published in 1955, for people from sixteen to sixty-four, including a special standardization for people age sixty to seventy-five.
Originally, IQ tests were individual tests, not group tests. However, large-scale tests were given as the military and other large organizations began using them. Individual tests tend to be more accurate because an individual examiner is more likely to note the mood of a test taker in a one-to-one setting than in the more typical group setting. Individual tests are more likely to be administered to those thought to be gifted or individuals suspected of having an intellectual impairment or learning disability. Group tests are more common in educational and military settings. All intelligence tests were originally individual tests, meaning they were given in a one-to-one setting.
There is significant dispute regarding the nature of intelligence and whether it can be measured quantitatively. Additionally, since the 1930s, there have been several virulent disputes regarding the role of genetics and environment in determining IQ, often termed the nature-nurture debate. Most psychologists concede that because environments are never uniform and the expression of genes is elastic, the argument for one or the other element as the sole determination of intelligence is somewhat flawed. Thus, intelligence, whatever it may be, is a function of both nature and nurture, environment and genetic makeup.
Twin studies estimating environmental effects put genetic factors pertaining to “intelligence” at somewhere below 50 percent. However, wide variation exists according to the particular characteristic of intelligence under study. Indeed, later views of intelligence hold that many different abilities make up intelligence. The question for those who seek to measure intelligence, the process of psychometrics, is how to measure specific and general intelligence. Researchers note that there are many skills involved in both academic and professional success. For example, spatial intelligence is related to success in mathematics, science, engineering, architecture, and related fields, while it is not as important in literature or music.
Psychometrics
A number of theories of intelligence exist: psychological measurement, often called psychometrics; cognitive psychology, the merger of cognitive psychology with conceptualism; and biological science, which considers the neural bases of intelligence. Psychometric theories have been most concerned with the quantification of intelligence and its parts. Psychometricians generally seek to understand the structure of intelligence, that is, the forms it may take and the relationship between any parts it may have. These theories are tested through paper-and-pencil tests. These tests include analogies, classifications, and series completions.
The psychological model on which these tests are based states that intelligence is made up of abilities that mental tests measure. Each test score is based on a weighted composite of scores taken from the underlying abilities. The mathematical model is additive and assumes that less of one type of ability can be compensated for by more of another ability.
Charles Spearman, who put forth the first psychometric , published his first major article on intelligence in 1904. Spearman noted that people who do well on one mental ability test generally do well on others and, conversely, those who do poorly on one test tend to do poorly on others. Spearman’s enabled him to posit that there are two major factors underlying intelligence. The first and more important factor is the “general factor,” or g. The second factor is that which is specifically related to each particular test. Spearman was not sure what g was, but he did posit that it was “mental energy.”
L. L. Thurstone disagreed with both Spearman’s theory and his isolation of a single factor of general intelligence. Thurstone argued that Spearman’s misapplication of his factor method led him to find just one factor, the g factor. He argued that there are seven primary mental abilities underlying intelligence: verbal comprehension, verbal fluency, number, spatial visualization, inductive reasoning, memory, and perceptual speed.
Psychologists such as Philip E. Vernon and Raymond B. Cattell argued that, in some senses, both Thurstone and Spearman were correct. Their reasoning is that abilities are arranged in a hierarchy. General ability, or g, is at the summit. The other abilities relate to ever more specific tasks as a person descends the hierarchy. Cattell went on to suggest that there are two major categories of abilities: fluid and crystallized. Fluid abilities, reasoning, and problem-solving, are measured by tests such as analogies, classifications, and series completions. Crystallized abilities, derived from fluid abilities, include vocabulary, general information, and knowledge about specific fields. Most psychologists agreed that a broader subdivision of abilities was needed than was provided by Spearman, but not all of them agreed that the subdivision should be hierarchical. Other psychologists disagreed with the hierarchical ordering of abilities. The structure-of-intellect theory devised by J. P. Guilford, for example, postulated 120 abilities. He later increased the number to 150.
In general, it was becoming obvious to many that there were problems with psychometric theory. The number of factors had gone from 1 to more than 150. No satisfactory explanation was given for any of these factors that explained overall intelligence.
Twenty-first-century developments in ability testing and psychometrics have improved these tests' reliability, validity, and potential applications. Modern tests are better predictors of job performance when combined with other job-specific selection criteria, particularly tests involving numerical, verbal, and logical reasoning. Some research indicates that testing job-specific knowledge on tests is a better predictor of job success than cognitive ability testing. Though testing has improved to cause less adverse impact, cultural and socioeconomic disparities in testing scores remain an issue. Some ability testing began incorporating technology. Gamification of ability testing may offer a new approach that more accurately measures abilities in specific populations.
Twin Studies
Twin studies use two methods to measure the effect of nature and nurture on overall intelligence. The first method examines identical twins reared apart, and the second looks at the differences between identical twins reared together and fraternal twins reared together. Identical, or monozygotic, twins are not totally identical because they have had different experiences and are unique social and cultural products. Fraternal twins are formed from two different fertilized eggs, just as normal siblings are. Unrelated children reared together are also studied.
Although most identical twins studied show a 50 to 80 percent genetic contribution to intelligence, a closer examination reveals identical pairs with up to a twenty-point difference in IQ scores. This occurs when the environment is drastically different. The closeness of most identical twins is a result of nature and nurture; that is, the twins being raised in similar settings.
It has been reasonably obvious that many of the skills measured by IQ tests can be taught just as any other skill can be taught. If these skills can be taught, then at least part of what is measured by ability tests, including IQ tests, is learned and not inherent.
Specific Ability Tests
Among the more common ability tests are the School and College Ability Test (SCAT) and the Sequential Test of Educational Progress (STEP). The SCAT measures specific abilities in verbal and quantitative areas. It is used to make general, overall decisions about the level and pace of instruction. The SCAT focuses on aptitude, not specific educational goals. The STEP battery measures actual achievement in reading, written language, and mathematics. STEP measures actual mastery and is, therefore, useful in indicating skills a student is ready to master.
Both SCAT and STEP testing can be used for in-grade-level or above-grade-level testing. In-grade-level testing provides information compared with others in the same grade, while above-grade-level testing indicates probable success or failure compared with those in higher grades.
SCAT assesses both verbal and mathematical reasoning abilities using verbal analogies and quantitative comparison items. STEP mathematics computation measures a broad variety of computational skills, including operations (with whole numbers, fractions, and percentages) to the evaluation of formulas and manipulations with exponents. STEP mathematics basic concepts measure knowledge of various concepts, including those involving numbers and operations; measurement and geometry; relations, functions, and graphs; and proofs. It also includes knowledge of probability and statistics, mathematical sentences, sets and mathematical systems, and applications. STEP reading measures the capacity to read and appreciate a multiplicity of written materials. STEP English expression measures the aptitude to assess the accuracy and efficiency of sentences by requiring the student to perceive mistakes in grammar and usage or to decide among rewordings of sentences.
The SAT Reasoning Test is a widely used aptitude test that attempts to measure both intelligence and ability to undertake college studies. There are verbal and mathematical components to the test. The score on each test is 500, and each has a of 100. The test was standardized on a group of ten thousand students in 1941. However, when scores dropped in the 1990s, with a verbal mean of 422 and a mathematical mean of 474, there was a readjustment of means. Educators attributed these lower scores of the student to television and to deterioration in home and school situations.
Controversies
IQ and other ability tests have been widely criticized, especially since the 1960s. These controversies have centered on the Eurocentric nature of the tests; namely, they have been designed primarily for use with white, middle-class children. The tests, therefore, have drawn fire from critics for being culture-bound. Some have seen them as unfair to African Americans, Latinos, and members of other minority groups. However, attempts to create culturally neutral tests have failed, and the tests have withstood court challenges. In Parents in Action on Special Education (PASE) v. Hannon (1980), a US District Court case involving Chicago schools, it was settled that the tests were not culturally biased and could be used to place children in special education courses.
These concerns over cultural bias, however, have raised another related issue. That issue goes to the heart of IQ testing and concerns exactly what the tests measure. Critics argue that the tests do not measure mental abilities. The tests, they say, do not show how children arrive at their answers, only whether they are right or wrong. Knowing how a child arrives at an answer would better allow evaluators to gauge intelligence, for those who arrive at a right answer by guessing are not necessarily more intelligent than those who get the wrong answer but whose reasoning is sound. Additionally, people from different cultural backgrounds have different but equally valid ways of approaching problems. Westernized tests do not take these skills into account.
Moreover, there is still a debate concerning the relative impact of nature and nurture on intelligence. Those who hold the predominant role of have used comparative test results to argue for the dominant role of genetic differences among the various ethnic groups. In the early 1970s, the published research of Nobel Prize–winning physicist William Shockley of Stanford University and educational psychologist Arthur R. Jensen of the University of California concluded that heredity accounts for most differences in intelligence among different racial groups. This conclusion caused a great controversy, matched by the publication of The Bell Curve (1994) by Richard Herrnstein and Charles Murray, which came to much the same conclusions: intelligence is primarily inherited, and there are different levels of intelligence among races.
Another controversy regards the tendency of most tests to take a holistic approach to intelligence. The Stanford-Binet test, for example, sees intelligence as a unified trait. In the minds of many critics, IQ tests are designed to measure a particular type of ability defined by the predominant class. Tests are culturally biased, so scores do not reflect an objective universal pattern of intelligence. Intelligence, they argue, is a social . Guilford devised a 180-factor model of intelligence, which classified each intellectual task according to three dimensions: content, mental operation, and product. This theory is the predecessor to Howard Gardner’s theory of multiple intelligences, developed in 1985.
Because of the influence of those social scientists who have argued for the influence of cultural differences, the tests are not the only basis for evaluating intellectual performance. There is a much greater awareness on the part of most psychologists of motivational and cultural factors in the role of .
Response to Criticism
Intelligence tests seek to measure intellectual potential by using novel items, forcing test takers to think on the spot. The point is to avoid tapping factual knowledge. It is understood by psychologists that people come from different backgrounds, so it is difficult, if not impossible, to find items that are totally novel. Therefore, test makers require test takers to use relatively common knowledge. It is impossible to control for all of a test taker’s prior knowledge. Therefore, intelligence scores represent a blend of potential and knowledge.
IQ tests have reliability correlations in the range of 0.90 and above, which is higher than most other psychological tests. This does not mean that variations in motivation or anxiety do not lead to misleading scores. IQ tests are also valid when used to predict success in academic work. They are, therefore, great predictors of school success but poor predictors of other types of success. People have acquired the belief that these tests measure a general sense of mental ability when they actually focus on abstract reasoning and verbal fluency, the type of skills needed for academic success. They do not measure practical or social intelligence. IQ tests do not stabilize until adulthood, and even then, they can change. There is a high correlation between high IQ scores and being in a prestigious occupation. Specific success in any given occupation, however, cannot be predicted in a meaningful way.
IQ tests are stable, reliable, and valid and predict academic success and occupational status. They are one good measure of giftedness and can be used with measures of creativity to aid recognition of this type of intelligence. They can also identify which children should be placed in remedial classes.
Conclusion
It is essential to note that no psychological test should be used in isolation, whether that test is diagnostic of psychological and behavioral problems or of ability. Each test result needs to be compared with and used in conjunction with results from other tests. Trained psychologists need to evaluate the test results in context, whether these are diagnostic tests, intelligence tests, tests for evaluating emotional , or tests.
Much progress has been made since the era of the dominance of psychometric theories. Then, the study of intelligence was dominated by investigations of individual differences in people’s test scores. Lee Cronbach, a major figure in testing, bewailed the segregation of those who study individual differences and those who seek regularities in human behavior. He made his plea for a union of these studies in an address to the American Psychological Association in 1957. His call helped lead to the development of cognitive theories of intelligence.
The use of cognitive theories has aided in interpreting the results of ability tests, for they give an understanding of the processes underlying intelligence. These processes allow an evaluator to understand why someone may do poorly on various tests. It may not simply be a matter of poor reasoning, for example, that leads to poor performance on an analogies test. It may be that the student does not understand the words in the analogies. The different interpretations may lead to different recommendations. Someone who is good at reasoning but does not understand basic vocabulary requires an intervention that is different from that needed for someone who is a poor reasoner.
For cognitive psychologists, intelligence is a combination of a set of mental representations and processes that can operate on them. Thus, ability tests based on these principles have sought to measure the speed of various types of thinking. There is, moreover, an assumption that processes are executed serially. There are many cognitive theories of intelligence, but all assume a mental process working on a mental representation.
Many scientists have proposed cognitive theories of intelligence, including Earl B. Hunt, Nancy Frost, and Clifford E. Lunneborg. In 1973, they demonstrated that psychometrics and cognitive modeling could be combined. They started with tests that experimental psychologists used to study perception, learning, and memory. Individual differences in these tests were related to patterns of individual differences in IQ scores. They concluded that the basic cognitive process could be the basic component of intelligence.
Other developments led psychologists to investigate the cognitive components of the skills tested on psychometric tests. When these basic components were isolated, they could be evaluated and tested in isolation to compute their relationship with intelligence. This was done for information processing and computer modeling. Computer modeling, such as that of Allen Newell and Herbert Simon, uses a means-ends analysis to determine how close a problem is to a solution. Newell and Simon proposed a general theory of problem-solving.
Some psychologists hold that information processing is parallel rather than serial, meaning that the brain processes information simultaneously, not in a serial fashion. However, it proved difficult to construct ability tests to test this . Moreover, the fact that intelligence differs between cultures, as Michael E. Cole has argued, has historically been ignored in psychometric testing. Additionally, some psychometric tests are poor indicators of job performance.
Bibliography
Binet, Alfred, and Théodore Simon. The Development of Intelligence in Children. 1916. Reprint. Ayer, 1983.
Fish, Jefferson M., ed. Race and Intelligence: Separating Science from Myth. Erlbaum, 2002.
Green, Anthony. Exploring Language Assessment and Testing: Language in Action. Routledge, 2014.
Gregory, Robert J. Psychological Testing: History, Principles, and Applications. Pearson, 2014.
Herrnstein, Richard, and Charles Murray. The Bell Curve. Simon, 1996.
Lynn, Richard. The Global Bell Curve: Race, IQ, and Inequality Worldwide. Washington Summit, 2008.
Minton, Henry L. Lewis M. Terman: Pioneer in Psychological Testing. NYUP, 1988.
Murdoch, Stephen. IQ: A Smart History of a Failed Idea. Wiley, 2007.
Naglieri, Jack A., and Sam Goldstein, eds. Practitioner’s Guide to Assessing Intelligence and Achievement. Wiley, 2009.
Plomin, Robert, et al. Behavioral Genetics in the Postgenomic Era. APA, 2003.
Shaffer, David R., and Katherine Kipp. Developmental Psychology: Childhood and Adolescence. 9th ed., Cengage, 2020.
"The Standards for Educational and Psychological Testing." American Psychological Association, 2024, www.apa.org/science/programs/testing/standards. Accessed 1 Oct. 2024.
Urbina, Susana. Essentials of Psychological Testing. 2nd ed., Wiley, 2014.
Woods, Stephen A., and Fiona Patterson. “A Critical Review of the Use of Cognitive Ability Testing for Selection into Graduate and Higher Professional Occupations.” Journal of Occupational and Organizational Psychology, vol. 97, no. 1, 2024, pp. 253–72, doi.org/10.1111/joop.12470. Accessed 1 Oct. 2024.