Testing, tests, and measurement
Testing, tests, and measurement in psychology encompass a range of methods used to assess psychological constructs such as intelligence, mood, personality, and motivation. Psychological constructs are abstract concepts that influence human behavior and cannot be directly observed. Historically, various methods have been employed to measure these constructs, with self-report measures being the most prevalent today. Self-report tests have advanced significantly, allowing for the detection of potential discrepancies in individuals' responses. Another measurement approach involves behavioral observation, where a subject's actions are monitored and recorded to infer psychological characteristics, such as impulsivity.
For a psychological test to be effective, it must demonstrate reliability, indicating that it consistently measures what it intends to measure. Validity is equally crucial, reflecting whether a test accurately captures the intended construct and produces meaningful results. Establishing test validity is a continuous process that includes demonstrating content validity, convergent and discriminant validity, and construct validity. Many psychological tests also incorporate normative data, which helps contextualize an individual's score in relation to a broader population. Different types of psychological assessments exist, including educational tests, neuropsychological tests, and personality tests, each serving unique purposes in understanding human behavior and mental processes.
Testing, tests, and measurement
Type of Psychology: Consulting, Counseling, Clinical, Developmental, Forensic, Organizational, Psychometrics, Psychopathology, Social
It takes years and sometimes decades of work to scientifically develop a test that measures a psychological construct such as intelligence or personality. To be useful, a test must have evidence that it is reliable and valid, and it also should have normative data. The most common types of tests are educational tests, neuropsychological tests, and tests that measure personality.
Introduction
All psychological tests measure psychological constructs. A construct is any psychological concept, such as intelligence, mood, personality, or motivation. We cannot visually see a psychological construct or describe it in physical terms, but we can agree that psychological constructs are real and are a part of human behavior and human experience. Because psychological constructs are important, history documents various attempts to measure them, some more scientifically valid than others.
Hippocrates put forth a theory of personality, classifying people as one of four personality types: sanguine (fun), choleric (ambitious), melancholic (thoughtful), or phlegmatic (relaxed). Theoretically, the amount of several bodily fluids one had held the key to measuring and thus determining personality type. Brilliant as Hippocrates was, his theory of personality did not bode well, and although his general notion that biochemistry and personality are connected is to some degree true, attempts to tie biochemical measures to aspects of personality, psychopathology, and mood have been elusive.
Therefore, without physical, biological measures of psychological constructs, we are left to measure these constructs by simply asking people about themselves. This form of measurement is called self-report and is the most common form of psychological measurement. Most psychological tests are self-report measures. Over the past fifty years, self-report measures have grown more sophisticated. Many self-report measures can determine if the test taker was fibbing about him or herself, and so can succeed in detecting people's attempts to “deceive the test.” This has led to more accurate self-report measures.
Another way to measure psychological constructs is by behavioral observation. For example, observing a child in a classroom and recording every time that the child gets out of his or her seat or interrupts the teacher may be a measure of the psychological construct of impulsivity or attentional difficulty. With behavioral observation measures, a person does not tell the examiner about him or herself as in self-report, but simply shows how he or she behaves and the examiner observes and records this behavior.
Most modern psychological tests, whether they are self-report measures or behavioral observation measures, must have evidence that they are scientifically valid before they are used by professional psychologists.
What is a scientifically valid psychological test? It takes years and sometimes decades of work to scientifically develop a test that measures a psychological construct. Psychologists who develop tests must establish that a test is reliable, valid, and has normative data that demonstrate that it is a useful test for the construct it intends to measure.
Test reliability
A reliable test is one that measures a construct in a way that can be replicated. One way of establishing that a test is reliable is to show that the test yields the same result every time it is given to someone. This type of reliability is called test-retest reliability. For example, if someone takes a test that measures a construct such as visual-spatial ability, he or she should get essentially the same score on that test if it is taken again a few months later. If the construct of interest is fairly stable (like visual-spatial ability), the test should yield essentially the same result every time it is administered.
But what about a construct that changes over time, like mood? Test-retest reliability is not as useful for a varying construct like mood. Tests of constructs that vary over time can nevertheless be demonstrated as reliable if they show internal consistency. A test can be shown to be internally consistent if all the items on that test (such as test questions) correlate with one another and are shown to be measuring the same construct. For example, if someone takes a test to determine how anxious he or she feels in different situations, every item (or question) on that test should be related to every other item so that they are all measuring the construct of “situational anxiety” together, in a way that is consistent. If someone were to take a test assessing situational anxiety while relaxing at home, the test items should all work together in a consistent way to show a low degree of situational anxiety. But if someone takes a test of situational anxiety while waiting for an important job interview, all the test items should work together to show a relatively higher degree of situational anxiety. Whether a construct is stable over time or not, the test to measure that construct should show internal consistency.
For a test to be useful, it must be reliable. But it is possible to have a perfectly reliable test that is totally useless. How useful a test is as a measure of a construct is called test validity.
Test Validity
The usefulness and importance of a test, or test validity, is something that is demonstrated over time, from repeated use of the test, and from repeated demonstrations that the test really does, in fact, measure the construct that it's supposed to measure in a way that yields important information. In this sense, although some degree of test validity needs to be demonstrated before a test is published and used by professional psychologists, establishment of a test's validity is a process that goes on indefinitely. For example, consider a test that purports to measure the construct of intelligence. Intelligence tests, and the construct of intelligence itself, continue to be developed, studied, and understood in an evolving way. Intelligence tests, and the construct of intelligence, have evolved together, and theories of intelligence and tests to measure intelligence have contributed to one another's development.
Establishment of a test's validity tends to occur first by showing that the test has content validity, second, that the test has some convergent, discriminant, and predictive validity, and finally, that the test has evidence of construct validity.
Content Validity. The first indication that a test is valid is that its content represents the construct that the test is supposed to measure. The items on a test that make up its content should be drawn from an accepted theory about the construct of interest. The test should cover a representative sample of all aspects of the construct, and should decidedly not represent ideas that are unrelated to the construct.
Convergent, Discriminant, and Predictive Validity. Once it can be established that a test has internal content that reflects a construct, then it's time to take the test into the real world and see how it behaves. A test should converge, or correlate statistically, with other measures of that construct (or a similar construct), and should diverge, or discriminate, between the construct of interest and constructs that are unlike it. For example, an intelligence test should correlate with other intelligence tests, and with criteria that are related to intelligence, like academic achievement. An intelligence test should not correlate with unrelated constructs, such as mood or personality. In addition to convergent and discriminant validity, a test also should predict behavior that is to happen in the future. Therefore, an intelligence test should be able to predict whether or not someone will behave intelligently in the future, for example, predict someone's level of academic achievement.
Construct Validity. Construct validity is like the holy grail of establishing test validity. Discussions of construct validity tend to focus on how the test elucidates psychological theories. A test also can demonstrate construct validity if it can be used to prove or disprove hypotheses about the construct of interest. In the process of establishing that a test has construct validity, novel ways gathering validity data can be established that can further the theoretical development of the construct and, ideally, the development of the testing enterprise itself.
Normative data
Most of the time, when a psychologist actually uses a test, he or she has some very practical concerns and reasons for using it. Usually, the psychologist wants to know, “is this person displaying normal levels of the construct I'm measuring?” Assuming that a psychologist is using a test that has established an acceptable level of reliability and validity, it is then very useful that the publisher of the test also supply some normative data. To collect normative data, the test should be administered to a large number of individuals that represent the general population. Scores on most psychological tests are distributed on a normal bell curve, such that relatively few scores occur on the highest and lowest extremes, and most scores rest somewhere in the middle of the score distribution. With normative data, any given individual's score can be compared to the normal distribution of scores in the general population.
Sometimes, normative data can be collected from special populations that are subsets of the general population. For example, commonly used intelligence tests have different sets of normative data for people of different ages. This way, younger people can be compared to one another, and older people (who generally have a different type of cognitive profile) can be compared to one another.
Types of psychological tests
The most common type of test that people take, by far, is an educational test, most often a test to determine how much learning has been achieved. Teachers routinely make up and give tests to see if their students really do understand what they've been taught, such as a high school math teacher making up and administering a test to determine if students understood the principles of algebra that had been taught. Most of these tests have not gone through any formal establishment of reliability or validity. But this doesn't matter because the teacher knows what he or she has taught, and needs only address the issue of content validity: that is, whether or not the content of the test represents what was taught. On the other hand, some educational tests, such as the Scholastic Aptitude Test (the SAT) that is used for college admission procedures, have demonstrated a high level of reliability and validity, and have extensive normative data.
Neuropsychological tests measure brain-behavior relationships, such as the ability to use different aspects of memory. Many intelligence tests are used as neuropsychological tests because they measure different aspects of cognitive functioning. Imaging techniques, such as a computed tomography scan (CT scan), provide excellent information on brain structure, but do little to elucidate how the brain actually functions. Neuropsychological tests are used to understand brain function. For example, a CT scan can show that someone has suffered a stroke that has affected the structure of the brain, but neuropsychological tests can determine the extent to which the stroke has affected memory, verbal ability, and other aspects of how the brain functions. Neuropsychological tests also can show how functional ability can recover over time after a stroke.
Personality tests measure a person's beliefs, attitudes, ways of viewing the world, and general ways of behaving. Simply put, most theories of personality, and personality tests, want to describe “who a person really is.” Even Hippocrates was trying to understand and classify general types of people and how they viewed the world and generally behaved. Personality tests measure normal aspects of personality, such as introversion or extroversion. Personality tests also measure aspects of personality that can be problematic, such as depression, anxiety, and types of mental illness. Personality tests can be used to diagnose problems that people have, and can give clues about what kinds of treatments might offer help for those problems.
Bibliography
Anastasi, Anne, and Susana Urbina. Psychological Testing.7th ed. Upper Saddle River: Prentice Hall, 1997. Print.
Gould, Stephen Jay. The Mismeasure of Man. New York: Norton, 1996. Print.
Miller, Leslie A., and Robert L. Lowler. Foundations of Psychological Testing: A Practical Approach. 5th ed. Thousand Oaks: Sage, 2016. Print.
Urbina, Susana. Essentials of Psychological Testing. 2nd ed. Hoboken: Wiley, 2014. Print.