Achievement Tests

This article provides a general overview of standardized achievement tests. Beginning with a working definition of achievement tests and providing comparisons to Criterion Referenced Tests, the article seeks to provide a foundation from which to build an understanding of the different perspectives on achievement tests. The article explores, in depth, perspectives related to the actual design elements of standardized achievement tests and the effects on curriculum and instructional methodologies as related to "teaching to the test." It further provides an analysis of socioeconomic status as a contributor to the achievement gap and highlights a common accommodation provided for students with learning differences when taking standardized achievement tests.

Keywords Achievement Gap; Achievement Tests; Criterion Referenced Tests (CRT); High Stakes Tests; No Child Left Behind Act of 2001 (NCLB); Norm Referenced Tests (NRT); Standardized Achievement Test

Overview

Definition of Achievement Tests

Achievement tests are used to measure student achievement. These tests are often referred to as norm referenced tests (NRT). Achievement tests are also referred to as standardized tests and therefore have a specific set of criteria that must be followed in order to classify as such a test. Ediger (2003) indicates that in order for a test to qualify as a standardized achievement test, it must enforce time limits, must ensure that the subject matter tested is the same for all students taking the test, and it must provide the same directions for all students.

Questions on standardized achievement tests are written by groups of professional educators and writers. Ediger (2003) indicates that pilot tests are given to specific groups of students in order to determine if test items are too easy or too difficult. For example, if students in a pilot study all answer the same question correctly, test writers then know the test item is too easy and needs to be rewritten. Ediger (2003) further indicates that numerous pilot studies are conducted to ensure that test items are appropriate and that student scores spread on a continuum from low to high.

Marchant (2007) highlights popular national standardized achievement tests, including the Terra Nova and the Standford-9. Although some people believe the SAT is an achievement test, it is actually considered an aptitude test because it measures the likelihood that a student will achieve in college. Marchant (2007) further highlights the fact that many states have turned to developing their own standardized achievement tests in order to better align test questions with state standards. As achievement tests continue to become more high stakes in nature, states aim to ensure that tests reflect the curriculum taught in the classroom as closely as possible.

Marchant (2007) indicates that most standardized achievement tests are considered norm referenced tests because students are compared as individuals to a large group of test takers. Test takers define a cut-off point at which students either pass or fail a standardized test. The cut-off point is used to provide a means to compare between individuals and groups of students. Much careful consideration is given to assessing the placement of the cut-off point.

Difference between Standardized Tests & Criterion Referenced Tests

Popham (2007) points out that many people believe all academic achievement tests are interchangeable; one test is the same as another. However, this is definitely not the case as a quite a variety of tests with different objectives exist. Since standardized achievement tests are designed to spread scores out among participating students in order to differentiate between students, Criterion Referenced Tests (CRT) were developed to provide another means to measure student achievement. Criterion Referenced Tests differ from Standardized Tests in that CRTs are designed to assess how well an individual student meets a defined standard or objective. Ediger (2003) indicates that CRTs provide objectives for teachers to use as guidelines to assist with instruction. Additionally, Ediger (2003) highlights that some states use CRTs as a means to determine which students are passed on to the next grade and which students are held back.

Among some of the differences highlighted by Ediger (2003), the author indicates that CRTs do not spread students across a continuum of achievement. Therefore, as many students who can achieve the highest percentile possible, can actually do so without being compared to others. Furthermore, Ediger (2003) indicates that many test items on CRTs tend to be more open-ended as compared to standardized tests.

Further Insights

Standardized Achievement Tests Transform into High Stakes Tests

Tests originally designed to measure student achievement and to help diagnose areas where curriculum improvements can be made are transforming into high stakes tests used to evaluate success of students, teachers, schools, and school districts. Marchant (2007) indicates this trend is occurring because states are being pressured to implement accountability measures, especially with pressures felt from the implementation of the No Child Left Behind Act of 2001 (NCLB).

Marchant (2007) defines high stakes tests as any test that carries a serious consequence for students or educators. He clarifies that such consequences can include grade retention for students and rewards or punishments for schools or school districts that either meet, exceed, or fail to meet set objectives. Furthermore, Marchant highlights that such tests can determine whether or not a student is eligible for a particular program, whether not a student can graduate from high school and may even decide to which college a student is admitted.

Viewpoints

Alternative Perspectives on Standardized Tests

Popham (1999) indicates many standardized achievement tests aim to differentiate students according to the number of questions they answer correctly and the number they do not answer correctly. Therefore, many achievement tests, by nature, do not ask a large number of basic skills questions. To some educators and researchers, this fact alone poses a significant issue simply because achievement tests cannot be used to determine how many children have mastered specific basic skills. Furthermore, Popham (1999) asserts given the limitations of time on standardized achievement tests, the number of questions asked to assess a particular skills or concept may, in fact, be too few to accurately determine if students have achieved mastery. Marchant (2007) further highlights the fact that many achievement tests use multiple-choice questions as a primary form of assessment. Given time limitations and the range and scope of questions, many educators and researchers believe achievement tests provide a narrow perspective on student learning and therefore are limited. If questions on achievement tests were more closely aligned with what is actually taught in the classroom and more reflective of the skills and knowledge required for basic mastery instead of primarily focused on differentiating students, some educators believe achievement tests would provide a more sound analysis of student achievement.

Delayed Feedback

Marchant (2007) discusses the idea that standardized tests do little to improve student knowledge, skills, and learning outcomes. In particular, Marchant highlights the fact that immediate feedback is critical to provide necessary scaffolding for optimal learning to occur. In the case of standardized tests, feedback if often provided weeks, if not months, after a student completes the final question on the test. Learning, therefore, does not take place via standardized tests. Rather, standardized tests provide one snapshot of one particular moment in a student's learning journey.

Increased Test Anxiety

Due to the high stakes nature of many standardized achievement tests, Paris (2000) asserts that rather than viewing such tests as a learning tool, students are actually beginning to fear standardized tests and therefore are beginning to shift their focus from actually learning to determining whether or not they need to know a particular concept for a specific achievement test. Paris (2000) highlights that anxiety and fear escalate as students become older. Naturally, young students do not place as much emphasis on standardized tests and scores simply because they have not internalized the high stakes associated with such tests. Marchant (2007) discusses a study conducted by Hill (1984) in which researchers concluded that as many as 10 million elementary and secondary students performed below their ability level due to anxiety. Marchant (2007) asserts it is likely this number has increased over the years due to the increased emphasis placed on standardized tests.

Effects on Curriculum & Teaching Methods

Another concern expressed by educators and researchers is that standardized achievement tests, in many ways, affect the actual curriculum taught in classrooms and the ways in which teachers are teaching. Marchant (2007) cites studies by Brown (1992, 1993) and Romberg (1989) indicating that standardized achievement tests affect curriculum and instructional methodologies in two ways. First, teachers often feel pressure to narrow the scope of their curriculum to cover solely what is tested. Second, teachers often feel they need to avoid innovative teaching practices such as cooperative learning, project based learning, and other differentiated teaching strategies, to simply ensure they cover concepts and skills they know will be tested. Furthermore, Marchant (2007) discusses the fact that areas of the curriculum not usually covered on standardized achievement tests such as social studies, science, art, music, writing, etc. are often neglected to ensure that reading and math skills are emphasized. Marchant (2007) further asserts that standardized achievement tests encourage teachers to assess students in traditional, rote form via multiple choice quizzes and exams thereby further discouraging them from using more progressive, innovative authentic assessments.

Of even greater concern to some educators and researchers is the amount of time that teachers spend actually preparing students to take standardized achievement tests. Many people assert that test preparation distracts from authentic learning experiences in the classroom and further deemphasizes the importance of teaching skills and knowledge. Also, the question of validity often becomes an issue as some researchers assert that standardized achievement tests must accurately reflect what is actually taught in the classroom in order to provide a valid measurement of student learning and achievement (Marchant, 2007). In order to classrooms, schools and school districts to arrive at valid conclusions regarding the curriculum taught, the assessment used must align with the curriculum objectives.

Additionally, with an increased emphasis on standardized achievement tests has also come an even greater emphasis on teacher accountability with regard to student achievement. Policy makers have drawn a direct causal link between student achievement and effective teaching practices. Therefore, if a teacher teaches well it is expected that students will achieve and perform well on standardized achievement tests (Ediger, 2003). Ediger (2003) asserts this causal link is flawed, to some degree, because the teacher is not the only individual and the school is not the only institution with influence over a student and his/her learning outcomes. Ediger (2003) highlights that the home, the community, religious institutions, etc. have just as much affect on individual students as teachers and schools. Therefore, a single achievement test that views a student's academic performance through one particular lense is not sufficient to accurately reflect learner achievement. Rather, a variety of evaluative techniques need to be used to assess student achievement (Ediger and Rao, 2003; cited in Ediger, 2003).

Marchant (2007) summarizes the main concerns expressed by teachers and researchers in the field of education. First and foremost, Marchant (2007) highlights there is little evidence that teacher evaluations of students performance are flawed and that achievement tests are more accurate reflections of student performance. Second, there is no evidence that standardized achievement tests improve instruction and/or student learning. Finally, there are negative outcomes associated with high stakes achievement tests in terms of affects on teachers and students, the development of flawed policies, and financial burdens that take away from teaching and learning (Marchant, 2007).

Research Regarding Extended Time for Students with Disabilities

One issue that arises always with regard to standardized tests concerns how to create equitable, yet standardized, conditions for all students regardless of ability level or learning differences. If the main objective of standardized achievement tests is to measure student achievement, then students need to be provided with optimal learning conditions to ensure they perform as best as they can on the given test. One common accommodation provided to students with learning differences is extended time on such standardized achievement tests.

Elliot and Marquant (2004) indicate that students who do not have identified learning differences and/or disabilities exhibit similar performance levels regardless of timed or untimed testing conditions. However, students with learning differences and/or disabilities tend to significantly improve their scores when extended time is provided. Results from many different studies regarding times versus untimed testing conditions for students with and without learning differences indicate that untimed testing conditions are ideal for optimal student performance (Alster, 1997; Centra, 1986; cited in Elliot and Marquart, 2004). Elliot and Marquart (2004) highlight that students with learning disabilities who are provided with extended time on standardized achievement tests report they are able to answer every question, review graphs and/or charts, and feel more relaxed when they have extra time to work on the test.

Elliot and Marquart (2004) discuss how and when teachers should accommodate students with learning differences by providing extended time on achievement tests. First and foremost, the researchers advocate that teachers need to determine what the extended time is intended to do on an individual basis. Next, teachers need to determine how they will measure the effects of extended time and how they will know that the accommodation served its purpose. Elliot and Marquart (2004) assert that if extended time is provided without a clear purpose and without an expected result with regard to student performance, the accommodation may actually have a negative effect.

Are Race & Socioeconomic Status Indicators of Success on Achievement Tests?

The achievement gap is often referred to in the field of education with relation to the gap in various educational measurements between different groups of students when studied with regard to race, ethnicity, gender, and socioeconomic status. Duncan and Magnuson (2005) discuss the fact that national achievement tests consistently illustrate significant gaps in achievement between young Caucasian children and young African American and Hispanic students. In particular, the researchers discuss results from the 1998 Early Childhood Longitudinal Study in which both African American and Hispanic students scored significantly below Caucasian students in mathematics. Duncan and Magnuson (2005) set out to determine what might be causing such gaps in achievement and school readiness. One specific finding they note indicates that socioeconomic status of families is closely tied to performance and gaps in test scores. However, they emphasize that a link between socioeconomic status and achievement is very difficult to establish. Therefore, they are unable to unequivocally state that elimination of socioeconomic gaps will directly imply elimination of the achievement gap.

Duncan and Magnuson (2005) do indicate that it is not necessarily a stretch of the imagination to understand how and why higher family incomes may lead to creating the conditions necessary for certain children to be better prepared for school and to have a slightly higher academic advantage. Students from families with moderate to high incomes have access to good prenatal care, health care, nutrition, positive learning environments, etc (Duncan and Magnuson, 2005). However, the researchers indicate that policy implications regarding socioeconomic status are unclear.

Popham (2007) indicates that many achievement tests do assess what students have learned in school. However, Popham also emphasizes that many achievement tests also test students according to socioeconomic status and academic aptitudes. The author asserts that those students born into pre-existing advantages tend to perform better on aptitude-linked questions than those students who are not. Popham (2007) asserts that the greater the number of questions on an achievement test linked to socioecnomic status and academic aptitude, the less suitable to the test will be to help determine interventions necessary to close the achievement gap.

In 2012, civil rights and educational groups filed a federal complaint in New York alleging that African Americans and Hispanics were unfairly excluded from several of New York City’s high schools due to a single admission test that they believed was racially discriminatory (Baker, 2012). Reasons as to why vary, but lower income and African American and Latino students across the United States consistently score below higher income, white, or Asian students in achievement tests and college admission tests (Rooks, 2012).

Critics also point out that standardized tests tend to measure superficial thinking rather than critical thinking. One study classified elementary students “as 'actively' engaged in learning if they asked questions while they read, or tried to connect what they were learning to past lessons. The study then classified those students who just copied down answers, guessed a lot, or skipped over difficult parts” as being only 'superficially' engaged (Kohn, 2000, p. 1). However, it was the superficial learners who tended to score highest on the Comprehensive Test of Basic Skills and the Metropolitan Achievement Test. Other discoveries have proved to be similar, having come from studies of middle and high school students in which middle school students took the Comprehensive Test of Basic Skills and high school students took the College Board's SAT test. So while there are many students who are actively engaged in learning who do well on standardized tests, there is still a positive correlation between standardized test results and a shallow, less engaged approach to learning (Kohn, 2000).

Rabiner et al. (2004) discuss the effects of attention problems on student achievement. Primarily, the researchers provide insight into how a substantial amount of the achievement gap may be explained by higher rates of attention difficulties among African American students as compared to Caucasian students. Although some researchers have indicated in the past that hyperactivity and oppositional behavior may contribute to the achievement gap, Rabiner et al. (2004) indicate that attention difficulties play a more significant role and need to be further studied. If schools can figure out how to boost academic achievement for students with attention difficulties, perhaps the achievement gap would diminish.

Terms & Concepts

Achievement Gap: The achievement gap is often referred to in the field of education with relation to the gap in various educational measurements between different groups of students when studied with regard to race, ethnicity, gender and socioeconomic status

Achievement Tests: Achievement tests are used to measure student achievement. They are often referred to as standardized tests and therefore have a specific set of criteria that must be followed in order to classify as such a test.

Criterion Referenced Tests (CRT): Criterion Referenced Tests differ from Standardized Tests in that CRTs are designed to assess how well an individual student meets a defined standard or objective.

High Stakes Tests: High stakes tests include any test that carries a serious consequence for students or educators. Such consequences can include grade retention for students and rewards or punishments for schools or school districts that either meet, exceed, or fail to meet set objectives. Furthermore, such tests can determine whether or not a student is eligible for a particular program, whether not a student can graduate from high school and may even decide to which college a student is admitted.

No Child Left Behind Act of 2001 (NCLB): A broad and comprehensive bi-partisan education reform that addresses the issue of performance in American elementary and secondary schools. The Act focuses on accountability for schools and districts, choice for parents regarding low performing schools, and requirements for use of Federal education dollars.

Norm Referenced Tests (NRT): Most standardized achievement tests are considered norm referenced tests because students are compared as individuals to a large group of test takers.

Standardized Achievement Test: In order for a test to qualify as a standardized achievement test, it must enforce time limits, must ensure that the subject matter tested is the same for all students taking the test, and it must provide the same directions for all students.

Bibliography

Baker, Al. (2012, September 27). Charges of bias in admission test policy at eight elite public high schools. New York Times. Retrieved December 12, 2013, from http://www.nytimes.com/2012/09/28/nyregion/specialized-high-school-admissions-test-is-racially-discriminatory-complaint-says.html

Chua, Y., & Don, Z. (2013). Effects of computer-based educational achievement test on test performance and test takers’ motivation. Computers in Human Behavior, 29, 1889–1895. Retrieved December 13, 2013, from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=89112835

Duncan, G. and Magnuson, K. (2005). Can family socioeconomic resources account for racial and ethnic test score gaps? The Future of Children, 15 , 35-54. Retrieved November 1, 2007 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=16335067&site=ehost-live

Ediger, M. (2003). Philosophy and measurement of school achievement. Journal of Instructional Psychology, 30 , 231-236. Retrieved November 1, 2007 from EBSCO Online Database Academic Search Premier. http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=10955027&site=ehost-live

Elliot, S. & Marquart, A. (2004). Extended time as a testing accommodation: Its effects and perceived consequences. Exceptional Children, 70 , 349-367. Retrieved November 1, 2007 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=12409143&site=ehost-live

Ford, D. Y., & Helms, J. E. (2012). Overview and introduction: Testing and assessing African Americans: "unbiased" tests are still unfair. Journal of Negro Education, 81, 186–189. Retrieved December 13, 2013, from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=83853077

Kohn, A. (2000). Standardized testing and its victims. Education Week, 20 , 60. Retrieved July 24, 2007 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=3730038&site=ehost-live

Marchant, G. (2004). What is at stake with high stakes testing? A discussion of issues and research. The Ohio Journal of Science, 104 , 2-7.

McNair, D. J., & Curry, T. L. (2013). The forgotten: Formal assessment of the adult writer. Journal of Postsecondary Education & Disability, 26, 5-19. Retrieved December 13, 2013, from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=90141854

Popham, J. (2007). A test is a test is a test-not! Educational Leadership, 64 , 88- 89. Retrieved November 1, 2007 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=23453046&site=ehost-live

Rabiner, D., Murray, D., Schmid, L, & Malone, P. (2004). An exploration of the relationship between ethnicity, attention problems, and academic achievement. School Psychology Review, 33 , 498-509. Retrieved November 1, 2007 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=15598439&site=ehost-live

Rooks, Noliwe M. (2012, October 11). Why it’s time to get rid of standardized tests. Time. Retrieved December 12, 2013, from http://ideas.time.com/2012/10/11/why-its-time-to-get-rid-of-standardized-tests/

Suggested Reading

Arffman, I. (2013). Problems and issues in translating international educational achievement tests. Educational Measurement: Issues & Practice, 32, 2–14. Retrieved December 13, 2013, from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=88229739

Bracey, G. (2002). Research - raising achievement of at-risk students--or not. Phi Delta Kappan, 83 , 431. Retrieved November 1, 2007 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=6033745&site=ehost-live

Haladyna, T., Nolen, S., & Haas, N. (1991). Raising standardized achievement scores and the origins of test score pollution. Education Researcher, 20 , 2-7.

Hoffman, J., Assaf, L., & Paris, S. (2001). High stakes testing in reading and its effects on teachers, teaching, and students: Today in Texas, tomorrow? Reading Teacher, 54 , 482-492. Retrieved November 1, 2007 from EBSCO Online Database, Education Research Complete http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=4046890&site=ehost-live

Morison, P. (1992). Testing in American schools: Issues for research and policy. Social Policy Reporter, 6; pp. 1-24.

Popham, W. (1999). Why standardized test scores don't measure educational quality. Educational Leadership, 56 , 8-15. Retrieved November 1, 2007 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=1660698&site=ehost-live

Turkyilmaz, M. (2013). The effect of the juvenile fiction on the reading skills of junior high school students. Reading Improvement, 50, 135–143. Retrieved December 13, 2013, from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=90419118

Essay by John W. Loeser, M.Ed.

John Loeser is an assistant head of an elementary school in San Mateo, California. He received his master's of education in school leadership from Harvard University. His research interests include differentiated instruction, improving instructional practice, and strategic change and leadership in schools. He is a member of the National and California Association of Independent Schools and the Association for Supervision and Curriculum Development. He resides in San Mateo, California with his wife.