Psychometrics

Psychometrics is a field of study devoted to tests and measurements. Common forms of psychometric tests include personality tests, IQ tests, and standardized tests like college entrance exams. Psychologists are trained in psychometrics and typically conduct many of the tests affiliated with the field. However, employers, educators, and school administrators use the data collected by the tests to create prevention and intervention strategies in the workplace and in schools. The tests can be used to identify social problems within a school, such as bullying, and develop profiles of students with learning or developmental disabilities. With a long history in the field of psychology, psychometrics is not without controversy. IQ tests, for instance, have come under increasing scrutiny as psychologists realize a more complex definition of intelligence.

Keywords Ability Tests; American College Test (ACT); Aptitude Test; Graduate Record Examination (GRE); Intelligence Quotient (IQ) Test; Numerical Reasoning; Personality Tests; Psychometric Tests; Psychometrics; Scholastic Aptitude Test (SAT); Spatial Reasoning; Standardized Testing; Stanford-Binet IQ Test; Verbal Reasoning

Testing & Evaluation > Psychometrics

Overview

Psychometric tests are tools used by employers, psychologists, and educators to assess test-takers various capacities. Psychometric tests generally fall into two categories: ability tests and personality tests. Ability tests are designed to measure a person's ability to complete a task. In the case of education, ability tests are created to assess a student's capability for (or probability of) academic success. Ability tests are generally multiple-choice and have a right or wrong answer for each question based on a "normative answer," the average of past responses given by test takers. Personality tests, however, have no right or wrong answers; they are designed to predict how a person may behave in a certain situation. These tests generally come in two forms: a questionnaire and an interview. From an educational perspective, personality tests can reveal a great deal about a student's ability to follow rules or potential for generating conflict.

Ability tests measure a variety of skills including

• Verbal reasoning, or how a student interprets written passages and uses vocabulary,

• Numerical reasoning, or how a student interprets charts and graphs, and

• Spatial reasoning or how a student interprets abstract conditions.

Most people know the most common tests by name. The well-known ability tests include variations of intelligence quotients (IQ) tests, the Scholastic Aptitude Test (SAT), the American College Examination (ACT), the Graduate Record Examination (GRE), and the statewide, standardized tests. Ability tests are sometimes subdivided into achievement tests and aptitude tests. Achievement tests are intended to assess a person's present capability, and aptitudes tests are intended to assess a person's potential for acquiring new capabilities. Most ability tests provide just a single score, indicating how the person has performed on the task in question in comparison to the established normative response which is created from the scores of previous test-takers.

Ability tests are administered under exam conditions (generally with a proctor in a private room), and are strictly timed. A typical test might allot test-takers thirty minutes to answer thirty or more questions, but some tests take several hours, as they can be composed of numerous sections. There are frequently more questions than can possibly be completed within the timeframe, and, in most cases, it does not matter if the test is completed; what matters is the number of correct answers provided. The test taker's score is then compared with other, past test takers' scores, enabling teachers to assess a student's skills and to make predictions about his or her potential for developing other capacities.

Personality tests, on the other hand, are less objective. Comparison from one student to another does not necessarily give a tester (usually a psychologist) a concrete interpretation of a student's personality type or traits. For example, a psychologist could glean from an interview with a student that the student is introverted and may respond in an inhibited way in a specific situation. When comparing that response to peers within the same age group, a determination beyond the obvious (that the child is introverted) is difficult to establish without a full-fledged investigation into other aspects of the student's life. Whether or not the child has siblings could be a factor in his inhibition; his choices of activity outside of school could also be a factor. For example, a student could be introverted because he has boisterous older siblings, or he could have a reserved character because he's happy playing video games by himself. Again, there is no right or wrong answer when assessing a person's personality.

In fact, personality tests assess students' typical way of behaving, thinking, feeling, or perceiving in particular situations. The test results describe how emotions may affect test-takers, what activities motivate them, and how they may react to or function under certain types of stress. Personality tests tend not to have time limits. Students are interviewed and allowed to respond in however much time they need. If a hands-on activity is presented to the student, there is typically no expectation of time for completion. However, the amount of time a student takes to complete some tasks and how he behaves while completing said tasks can offer valuable information about a predictable behavior. Regardless, since factors above and beyond those that can be noted by pencil and paper establish a student's personality, only professionals practiced in this type of assessment (psychologists mostly) conduct and evaluate them, while most tests of ability are scored by a machine.

Applications

Bullying

While many of the most widely-known psychometric tests are used to gauge a student's K-12 classroom ability or aptitude for college material, many others are used more specifically to remedy a problem or present a different approach to a standing educational concern. The School-Wide Positive Behavior Support (SWPBS) assessment was designed for just these purposes.

In order to receive teacher certification in many states, teacher candidates must complete a bullying intervention workshop. Some workshops are offered online, while others are offered in traditional classroom settings. For the most part, the information presented teaches instructors to prevent bullying in the classroom and to identify students who may be bullies or the victims of bullying. Prior to the past decade, students identified as bullies were sent to the principal's office and given some form of discipline. However, Cohen, Kincaid & Childs (2007) note that this approach has had little success with regard to stopping the unwanted behavior. Teachers and administrators have figured out that rewarding positive behavior generally results in the behavior being repeated (Sugai et al., 1999) and that making the consequences of a student's actions perfectly clear to him or her can inhibit negative behavior (Baer, Manning, & Shiomi, 2006). As a response to the discovery that proactive rather than reactive approaches to discipline yield favorable results, the SWPBS was created (Cohen, Kincaid & Childs, 2007).

According to Lewis & Sugai (1999),

SWPBS is an intervention intended to improve the climate of schools using system-wide positive behavioral interventions, including a positively stated purpose, clear expectations backed up by specific rules, and procedures for encouraging adherence to and discouraging violations of the expectations (as cited in Cohen, Kincaid & Childs, 2007).

With such an emphasis on identifying bullying (thus the required workshop for educators), establishing intervention strategies to prevent the bully from acting out should be part of every school district's policy. This is not the case. In fact, the only way to determine if the SWPBS is successful is through the School-Wide Evaluation Tool, otherwise known as SET (Horner et al., 2004). This psychometric tool contains twenty-eight items that are measured through both observation and in personal interviews. The results are used to determine the overall behavioral climate of a school. SET requires a great deal of time and one-on-one interaction between students and test administrators for completion; this makes the instrument difficult to implement but otherwise worthwhile in the information it provides.

Autism Spectrum Disorders

In addition to the issue of bullying in schools, educators also have to consider the inclusion of students with special needs in their classrooms. Bellini & Hopf (2007) created a psychometric measurement to help develop profiles of children who exhibit some of the behaviors linked with autism.

According to the American Psychiatric Association (2000), "Individuals with autism spectrum disorders (ASD) experience challenges and impairments in the areas of communication, social functioning, and restricted, repetitive behavior." Each of these challenges and impairments has to be considered by family, staff members, and educators when considering curriculum for ASD children. The term ASD generally covers five identifiable disorders: autism, Asperger syndrome, Rett syndrome, childhood disintegrative disorder, and a less identifiable condition referred to as pervasive developmental disorder that is not otherwise specified (PDD-NOS); these diagnoses are given when clear signs of abnormal social interactions and communication have been identified in addition to restricted interests and highly repetitive behavior (sometimes considered to be obsessive-compulsive).

Each of these deficits can be difficult to combat, especially in the classroom, but it is the social dysfunction that can cause the most problems for a teacher. For example, a child who cannot speak may not be able to report that he has to use the bathroom or that he has been hurt by another child. Hume, Bellini & Pratt (2005) have noted the need for an assessment tool to help identify children's social capabilities. According to Bellini (2006b) & Tantam (2000), children with ASD show

…difficulties initiating or joining in social activities, understanding the viewpoints of others, and expressing feelings verbally. Furthermore, many children with ASD engage in off-putting behaviors, such as making inappropriate comments or dominating conversations with topics of personal interest, which thwart positive social interactions. These social skills deficits begin to have an impact on children with ASD at an early age and, if left untreated, have the capacity to set them on harmful developmental trajectories leading to eventual social anxiety, depression, isolation, and other unfavorable outcomes (Tantam, 2000, p. 81).

In response to the need for an assessment tool, the Autism Social Skills Profile (ASSP) was developed. It wasn't until about thirty years ago that the education community started to identify behaviors now categorized as autistic disorders. Describing the instrument, Bellini & Hopf (2007) note that the ASSP is a new assessment tool that provides a comprehensive measure of social functioning for children and adolescents with ASD. The items on the ASSP represent a broad range of social behaviors typically exhibited by individuals with ASD including initiation skills, social reciprocity, perspective-taking, and nonverbal communication skills. The ASSP can be completed by a parent, a teacher, or any other adult who is familiar with the child's social behavior. It was designed for use with children and adolescents with ASD between the ages of six and seventeen. The ASSP may be administered by professionals (e.g. psychologists, psychiatrists, social workers, counselors, and speech-language pathologists) who wish to design and implement social skills interventions (p. 81).

The value of this assessment is profound, but the most relevant benefit with regard to education is the capacity for intervention. When a child is identified as being deficient in one of the behavior ranges provided by the ASSP, curricular changes can be made to offer the most appropriate and effective instruction possible (Bellini & Hopf, 2007, p. 81).

No Child Left Behind

The No Child Left Behind Act of 2001 states that schools are responsible for the academic success of all of their students, regardless of a school district's budget, the education and experience of its teachers, the materials (or lack thereof) with which its teachers are able to do their jobs, and the socioeconomic and educational status of its students. Throw psychometrics (ie: standardized testing) into the mix, and school districts have a great deal for which to be accountable. To whom they are accountable, though, is an issue. According to a recent Phi Delta Kappan Gallop Poll (2007),

The public is growing disenchanted with the increased amount of standardized testing … It seems more than coincidental that the growing dissatisfaction with testing comes at a time when the use of test data in guiding high-stakes decisions has exploded. School people must be prepared to explain to students, parents, and the community why each test is needed and what purpose it serves (p. 47).

Parents who see their children becoming anxious around the time of required school testing don't see the results the scores may have on district-wide policy or classroom curricula.

Viewpoints: IQ Tests

The most controversial of all ability tests is the IQ test. To understand the controversy behind it, one needs to understand its creation. Long before inclusion classes were the norm in US schools, students had to be identified as having special needs in order to be eligible for specific educational programs. In response to a request by the French government in 1896, Alfred Binet created a test - also creating the emergence of intelligence testing within the field of education - to help determine the placement of children within these special programs. The point of the test was to identify students with intellectual deficiencies. While that work was considered disappointing - whether by the French government (who wanted to use the tests for large quantities of students) or by Binet himself (who claimed that one-on-one evaluation was the only way perform effective assessment) is in dispute - Binet continued his study a decade later with the assistance of a physician to focus on a specific population of mentally retarded children in a French school. Through this work, the Binet-Simon test emerged.

The Binet-Simon test used questions to measure the aptitude of verbal skills, attention, and memory by incorporating questions in the test that increased in difficulty (much like ability and standardized tests used today). Francher (1985) notes that, once again, Binet identified limitations of the test by pointing out that intelligence is a fluid concept that is difficult to measure regardless of the assessing instrument.

Binet's test, the Stanford-Binet IQ test, has seen several revisions over the past century. Its current form, as the Stanford-Binet 5, is widely used and considered a reliable measure of five intelligence foci:

• Fluid Reasoning,

• Knowledge,

• Quantitative Reasoning,

• Visual-Spatial Processing, and

• Working Memory.

Students who have hearing deficiencies, learned English as a second language, or have some sort of communication disorder are not at a disadvantage with regard to this test as the five focal factors are tested in verbal and nonverbal forms. For example, "test items include verbal analogies to test Verbal Fluid Reasoning and picture absurdities to test Nonverbal Knowledge … test makers assure people the Stanford-Binet 5 will accurately assess low-end functioning, normal intelligence, and the highest levels of giftedness" (Riverside Publishing, 2004).

However controversial, Brown & French (1979) make it clear that "IQ tests serve one function exceptionally well … they are composed of items that are representative of the kinds of problems that traditionally dominate school curricula [,and] they predict academic success or failure" (p. 255).

Conversely, though, Lewontin, Rose & Kamin (1984) point out in their book, Not in Our Genes, that IQ tests are nothing more than commercial products.

It must be remembered that an IQ test is published and distributed by a publishing company as a commercial item, selling hundreds of thousands of copies. The chief selling point of such tests, as announced in their advertising, is their excellent agreement with the results of the Stanford-Binet test (p. 89).

One aspect of these tests has been questioned at length: their fairness to test-takers. Specifically, test-takers are asked to answer questions that are based on expectations of what they should know. Any teacher knows that what one student does know another may not. A masterful artist may have no numerical conceptualization, and an excellent writer may falter when trying to decipher an abstract problem. Equally problematic is the socioeconomic status of the students tested. For example, Lewontin, Rose & Kamin (1984) point out that with IQ tests, students

…are asked to make class judgments ("Which of the five persons below is most like a carpenter, plumber, and bricklayer? 1) postman, 2) lawyer, 3) truck driver, 4) doctor, 5) painter"); they are asked to judge socially acceptable behavior ("What should you do when you notice you will be late to school?"); they are asked to judge social stereotypes ("Which is prettier?" when given the choice between a girl with some Negroid features and a doll-like European face) … Of course, the "right" answers to such questions are good predictors of school performance (p. 89).

Without knowing what a bricklayer is, a child would have difficulty answering the first question. Likewise, most students who have been late for school have a better knowledge of how the handle the situation than a student who has never been late. Finally, aesthetics are subjective. How can a child's academic success be predicted by his response to such a question? As an example of how these tests predict success by holding students to the established norm, Lewontin, Rose & Kamin (1984) use a correlation that, while extreme, demonstrates the problems of ranking children: "If one rat kills ten mice in five minutes, and a second rat kills twelve in the same time, this does not automatically mean that the second is 20 percent more aggressive than the first" (p. 92). Nor is one student who scores a 100 on an IQ test twenty percent more intelligent than a student who scores an 80.

As a child, Mike Rose, Professor of Social Research Methodology at UCLA, was accidentally placed in a remedial class when a clerical error switched his file with that of another student with the same last name. A year passed before the error was discovered. He knows what it is like to be placed into a category of remediation and as a result, he has made a career out of working with students who are considered remedial. He believes that, with encouragement and motivation, any student can achieve academically. However, he also thinks that the education system works against weak students from the beginning. In his book, Lives on the Boundary (1989), Rose states that as a society

We have provided elementary education for virtually all American children for some time now, and we fret more than many societies do about meeting the diverse needs of these young people. We test them and assess them-even kindergartners are given an array of readiness measures-in order to determine what they know and don't know, can and can't do. The supreme irony, though, is that the very means we use to determine those needs-and the various remedial procedures that derive from them-can wreak profound harm on our children, usually, but by no means only, those who are already behind the economic and political eight ball (p. 127).

Rose, not just because of his own experience of being labeled academically deficient, faults the testing process with causing students to conform when they might not otherwise do so. "Even as they rebel [against their labels] they confirm the school's decision … [and] gradually internalize the definition the school delivers to them" (p. 128).

Discussion

Nobody likes to be labeled, especially when a label can entrench a person in a constant struggle with academic achievement and teachers' low expectations. However, without testing educators would have no benchmarks on which to base their instruction. They would not be able to recommend students for gifted programs, and they would not know when to spend more time on individual students' academic problems. Nor would they know how to recognize a gifted child's potential problem or a substantial improvement in a child identified as academically weaker than his peers. Moreover, regardless of how psychometrics is viewed, standardized testing is not going away. Any improvement made toward assessing means of prevention or predicting intervention techniques when it comes to the skills, behaviors, and potential of students should be more important than whether or not the tests are considered good or bad.

Terms & Concepts

Ability Test: An assessment given to measure a person's capacity or talent for a particular skill or function.

American College Test (ACT)*: A test given to high school students to determine their eligibility for acceptance in public colleges or universities.

Aptitude Test: An assessment given to measure a person's potential to learn a skill.

Graduate Record Examination (GRE): Required test given to undergraduate students to determine their eligibility for acceptance to a graduate school.

Intelligence Quotient (IQ): A measure of a person's intelligence, generally obtained through tests conducted by members of the psychological community.

Numerical Reasoning: The ability to interpret and apply numerical concepts, generally to solve a specific problem.

Personality Tests: Psychological assessments given to determine specific personality characteristics.

Psychometrics: The area of psychology that specifically deals with measuring intellectual, mental, and behavioral capacities/capabilities of people.

Scholastic Aptitude Test (SAT)*: A test given to high school students to determine their eligibility for acceptance in colleges or universities.

Spatial Reasoning: The ability to interpret and apply abstract concepts, generally to solve a specific problem..

Standardized Testing: Exams designed to assess a student's potential for academic success as compared to an established "norm"; tests are regulated by federal, state and/or local government agencies.

Stanford-Binet IQ Test: A specific intelligence test generally given to children.

Verbal Reasoning: The ability to interpret and apply words, generally to solve a specific problem.

* Some colleges require both SAT and ACT scores or one test over the other for admission purposes.

Bibliography

American Psychiatric Association (2000). Diagnostic and statistical manual of mental disorders (4th ed., text rev.). Washington, DC: Author.

Baer, G. G., Manning, M. A., & Shiomi, K. (2006). Children's reasoning about aggressions: Differences between Japan and the United States and implications for school discipline. School Psychology Review, 35, 62-77.

Bellini, S. (2006b). The development of social anxiety in high functioning adolescents with autism spectrum disorders. Focus on Autism and Other Developmental Disabilities, 21, 138-145.

Bellini, S. & Hopf A. (2007). The Development of the Autism Social Skills Profile: A Preliminary Analysis of Psychometric Properties. Focus on Autism and Other Developmental Disabilities, 22 , 81. Retrieved October 8, 2007 from EBSCO Online Database Academic Search Complete. http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=25241855&site=ehost-live

Brown, A. L. and L. A. French (1979). The zone of potential development: implications for intelligence testing in the year 2000. Intelligence, 3, 255-271. Retrieved October 8, 2007 from EBSCO Online Database PsycINFO. http://search.ebscohost.com/login.aspx?direct=true&db=psyh&AN31220-001&site=ehost-live

Cohen, R., Kincaid, D. & Childs, K. E. (2007). Measuring School-wide Positive Behavior Support Implementation: Development and Validation of the Benchmarks of Quality. Journal of Positive Behavior Interventions, 9 , 203-213. Retrieved October 3, 2007 from EBSCO Online Database Academic Search Premier. http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=26579762&site=ehost-live

Ennis, R., Lane, K., & Oakes, W. (2012). Score reliability and validity of the student risk screening scale: A psychometrically sound, feasible tool for use in urban elementary schools. Journal of Emotional & Behavioral Disorders, 20, 241-259. Retrieved December 15, 2013, from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=83003075&site=ehost-live

Hart, S. R., Stewart, K., & Jimerson, S. R. (2011). The Student Engagement in Schools Questionnaire (SESQ) and the Teacher Engagement Report Form-New (TERF-N): Examining the preliminary evidence. Contemporary School Psychology, 1567-79. Retrieved December 15, 2013, from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=63010123&site=ehost-live

Horner, R.H., Todd, A.W., Lewis-Palmer, T., Irvin, L. K., Sugai, G., & Boland, J. B. (2004). The school-wide evaluation tool (SET): A research instrument for assessing school-wide positive behavior support. Journal of Positive Behavior Interventions, 6 , 3-12.

Hume, K., Bellini, S., & Pratt, C. (2005). The usage and perceived outcomes of early intervention and early childhood programs for young children with autism spectrum disorder. Topics in Early Childhood Special Education, 25 , 195-207.

Kincaid, D., Childs, K., & George, H. (2005). School-wide benchmarks of quality. Unpublished instrument, University of South Florida.

Lewis, T. J., & Sugai, G. (1999). Effective behavior support: A systems approach to proactive school-wide management. Focus on Exceptional Children, 31 , 1-17. Retrieved October 8, 2007 from EBSCO Online Database Academic Search Complete. http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=1859974&site=ehost-live

Lewontin, R. C., Rose, S. & Kamin, L. J. (1984). Not in our Genes. Pantheon Books: New York.

PDK/Gallop Poll (2007). Policy Implications. Phi Delta Kappan, 89 , 46-48. Retrieved October 4, 2007 from EBSCO Online Database Educational Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=26553437&site=ehost-live

Rose, M. (1989). Lives on the Boundary. Penguin Books: New York.

Sugai, G., Horner, R. H., Dunlap, G., Hieneman, M., Lewis, T. J., Nelson, C. M., et al. (1999). Applying positive behavioral support and functional behavioral assessment in schools. Technical Assistance Guide 1,Version 1.4.4. Eugene, OR: University of Oregon, Center on Positive Behavioral Interventions and Support. (ERIC Document Reproduction Service No. ED443244). Retrieved October 8, 2007 from EBSCO Online Education Research Database. http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/16/51/0e.pdf

Tantam, D. (2000). Psychological disorder in adolescents and adults with Asperger syndrome. Autism, 4 , 47-62. Retrieved October 8, 2007 from EBSCO Online Academic Search Complete. http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=5435532&site=ehost-live

Valois, R. F., & Zullig, K. J. (2013). Psychometrics of a brief measure of emotional self-efficacy among adolescents from the United States. Journal of School Health, 83, 704-711. Retrieved December 15, 2013, from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=90167789&site=ehost-live

Suggested Reading

Cangelosi, J. S. (1992). Systematic teaching strategies. Longman Publishing Group: New York

Gardner, H. (1983). Frames of mind. Harper Collins: New York.

Superfine, B. M. (2004). At the intersection of law and psychometrics: Explaining the validity of clause of No Child Left Behind. Journal of Law & Education, 33 , 475-513. Retrieved October 1, 2007 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=14852319&site=ehost-live

Superfine, B. M. (2005). The politics of accountability: The rise and fall of Goals 2000. American Journal of Education, 112 ,10-43. Retrieved October 1, 2007 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=18674917&site=ehost-live

Vygotsky, L. (Ed. Kozulin, A.) (1986). Thought and language. The MIT Press: Massachusetts.

Essay by Maureen McMahon, M.S.

Maureen McMahon received her bachelor's degree from the State University of New York at Plattsburgh where she studied English. Her master's degree in curriculum development and instructional technology was earned from the University of Albany. Ms. McMahon has worked in higher education administration for eight years and taught composition and developmental writing for the past six. She resides in Plattsburgh, New York with her husband and two children.