Computer-Adapted Testing

Abstract

This article focuses on computer-assisted and computer-adaptive testing. A computer-adaptive test is able to estimate a student's ability level during the test and can adjust the difficulty of the questions accordingly. The computer program selects the test items deemed appropriate for the student's ability level, the student then selects an answer using either the mouse or keyboard, and then the answer is automatically scored. In this paper, comparisons with paper-and-pencil tests and student achievement are covered. The advantages and disadvantages of using computerized testing are included as well as considerations for implementation.

Overview

Paper-and-pencil tests have long been the standard for testing students. A computerized test can do the same thing, but students use a mouse and keyboard to mark their responses instead of a sheet of paper and a pencil or pen. A computer-adaptive test (CAT) format is able to estimate a student's ability level during the test and can adjust the difficulty of the questions accordingly. This means that students can be taking different versions of the same test and can be answering a different number of questions based on their ability levels.

Many organizations and associations that provide certification and licensure give their tests on computers. Computers are used for college course placement testing, professional certification exams, vocational interest assessments, aptitude tests, and workplace skills assessments (Greenberg, 1998). States have also begun using computer adaptive testing for their state-wide assessments and have found the tests to not only provide results more quickly, but more reliable in testing student knowledge, especially in students at the very low and the very high ends of the spectrum (Davis, 2012).

Item Response Theory. Computer-adaptive testing is based on item response theory. Item response theory uses mathematical functions to predict or explain a student's test performance using a set of factors called latent traits. The relationship between student item performance and these traits can be described by an item characteristic function. The item characteristic function specifies that students with higher scores on the traits have higher expected probabilities for answering an item correctly than students with lower scores on the traits (Hambleton, 1989). Computer-adaptive testing selects test items to match each student's ability. The computer program selects the test items deemed appropriate for the student's ability level, the student then selects an answer using either the mouse or keyboard, and then the answer is automatically scored.

Computer-adaptive testing consists of two steps. The first step is selecting the difficulty level of the first test question to match the student's achievement level (Wainer, 2000, as cited in Latu & Chapman, 2002). In the second step, the question is scored, and the ability level is updated using the new information. If the student correctly answers the question, then a more difficult question or a question at the same level will be asked next. If the student incorrectly answers the question, then an easier question or question at the same level will be asked next. This process allows computer-adaptive testing to adjust the difficulty level of the assessment based on each student's current achievement level (Latu & Chapman, 2002).

How Does Computer-Adaptive Testing Work? Computer-adaptive testing begins with the first question based on the student's achievement level. This can be based on previous computer-adaptive tests or school reports (Hambleton, Zaal & Pieters, 2000, as cited in Latu & Chapman, 2002). If there is no personalized starting point for students, then the testing process begins at a predefined achievement level that can be set by the test administrator (Wise & Kingsbury, 2000, as cited in Latu & Chapman, 2002). There is some debate about the importance of the starting point. Some people believe that it does not matter what level the test begins at as long as it is short (Hambleton et al., 2000, as cited in Latu & Chapman, 2002), and others believe that beginning at an inappropriate level increases test anxiety for students (Wainer, 2000, as cited in Latu & Chapman, 2002), which can affect test performance.

Since test validity depends on the test items used, computer-adaptive testing relies on the test item bank of questions. A large test item bank can help minimize test security issues because there are so many items, which also allows the test to be tailored for more diverse student skill levels. Increasing the size of the test item bank can also increase the possibility of having flawed test items (Hambleton et al., 2000, as cited in Latu & Chapman, 2002). Since test items are all predicated on previous student responses, having flawed test items can have a large impact on students' scores (Potenza & Stocking, 1994; Wainer, 2000, as cited in Latu & Chapman, 2002). Paper-and-pencil tests allow for the removal of flawed test items because the person scoring can see that there is an issue with a question. This cannot be done with computer-adaptive testing because the scores are calculated automatically and the scorer cannot see that there may be an item in question (Latu & Chapman, 2002).

Computer-adaptive testing also requires a decision on when to stop the test. There are several methods that can be used, including testing until a level of measurement consistency is reached, setting a number of fixed test items for each student (Wainer, 2000, as cited in Latu & Chapman, 2002). The fixed length test is considered reasonable because it ensures that all students are given the same number of items, and it also keeps testing times short. However, there is a possibility that students will fail to answer some of the test items because they are too difficult (Mills & Stocking, 1995, as cited in Latu & Chapman, 2002). There can also be challenges with testing until a level of measurement consistency is reached that then requires the judgment of the test administrator. For example, students taking a college course placement test in mathematics have had issues with computer-adaptive testing placing them in lower-level classes when they feel they should be placed higher. Their contention was that they were asked several questions on the same concept, such as scientific notation, which they did not know; so the testing session stopped and placed them in a lower-level mathematics class. In cases like these, allowing students to retest has generally proven that they were incorrectly placed; and most students were placed in higher-level mathematics classes based on their retest results.

Computerized testing allows tests to be “linked to the district or state administering the test. This allows standards and performance outcomes to be linked to assessment measures and can provide” the district or state with valuable information about student progress (Olson, 2001, as cited in McHenry, Griffith & McHenry, 2004, ¶ 7) and whether or not the school is meeting the educational reforms set forth in President Obama’s 2010 Blueprint for Reform as well as the state Common Core Standards. This information can also be used by instructors to help align their curriculum with mandated standards and outcomes (McHenry et al., 2004).

Further Insights

Computer-adaptive testing came to prominence nationally in 1992 and continues to grow. The Graduate Record Examinations (GRE) was the first national test to be available as a computer-assisted test, and it came in a computer-adaptive form in 1993. Since the late 1990s, the Graduate Management Admission Test (GMAT) has administered the multiple-choice quantitative and verbal sections in a computer-adaptive format. The test is administered on paper in areas of the world with limited computer access. Scores from the quantitative and verbal sections are combined to provide an overall score, which is what business schools tend to focus on.

Nursing licensure examinations were in CAT format beginning in 1994 when they became available in computer-based format only (Bugbee, 1996). Computerized testing has become big business. In 1998, online testing products generated an estimated $750 million in revenues.

Types of Testing Packages. There are many software packages available that allow instructors to create their own exams or use prepackaged tests. An instructor can create a quiz or examination using a software program that allows the creating and uploading of his or her own questions or, the instructor can select questions from the program's test-item bank. After students complete the test, it can be scored instantaneously and downloaded to the instructors' computer. There are software packages that allow instructors to use the test scores for diagnostic purposes such as analyzing the test data, charting the progress of each student or the entire class, and also showing the instructor test questions that the majority of students correctly or incorrectly answered. The latter can help instructors identify possible areas that they perhaps did not cover as well or as thoroughly as they should have, which can help them adjust their instruction for the current class and shape their lesson plans for the next class. The more sophisticated programs even offer tutorials that identify individual or class skill deficiencies and recommend remediation for those skills (Greenberg, 1998).

Computerized Writing Assessments. One area where computerized testing and instantaneous grading has lagged behind is with writing assessments. Programs continue to be perfected, but schools, school districts, and states need to decide if they are going to allow students to use all the traditional word processing functions such as auto word correct, spell check, and the grammar check and thesaurus. These are important decisions when using the assessments to report adequate yearly progress under high-stakes tests. Many states choose not to utilize computerized adapted testing in the essay portions of their tests, believing these formats require the input of human judgement when grading a student’s work (Davis, 2012).

Detractors of using computer-assisted testing believe that students who come from lower-income households and poorer school districts will not have the same amount of access to computers as other students, which can affect their ability to complete a writing assessment on a computer as opposed to the traditional paper-and-pencil test. Before moving to computerized writing assessments, states must also look at whether all schools have the equipment they need to administer the tests, which is true of all efforts to fully adopt computer-assisted testing (Cavanagh, 2007).

Accommodating Special Needs. Students with disabilities may also need special consideration and accommodations when testing on a computer. Accommodations can include purchasing a special testing program, allowing students to test in a quieter room with fewer students and computers, and allowing extra time to complete the test. Since computers are usually so closely spaced together, it may be necessary to have test proctors in the room to help keep students focused on their own monitor if they are taking a test that is not adaptive or does not have more than one version. For students that have to go to a different school in order to take their tests, it can be a good idea to make a special trip with them to the testing site to make sure that the chairs and desks will fit for their size and to help alleviate any anxiety that may arise from testing in an unfamiliar place (McHenry et al., 2004).

Viewpoints

Advantages of Computerized Testing. There are many documented advantages of computerized testing. It can reduce testing time (Bunderson, Inouye & Olsen, 1989; English, Reckase & Patience, 1977; Green, 1988; Olsen, Maynes, Slawson & Ho, 1986; Wise & Plake, 1989, as cited in Bugbee, 1996), and it can provide more detailed information about those taking the test (Wise & Plake, 1989, as cited in Bugbee, 1996). Computerized testing can also provide increased test security (Grist, Rudner & Wise, 1989, as cited in Bugbee, 1996) and provide instant results (Bugbee, 1992; Bugbee & Bernt, 1990; Kyllonen, 1991; Mazzeo & Harvey, 1988, as cited in Bugbee, 1996).

Computerized testing can add a great deal of flexibility when it comes to test management because students can take a test whenever the computers and a proctor or test administrator are available. Computerized testing also allows for instantaneous results; in the past, it could take weeks or even months to get student results on large standardized tests (Greenberg, 1998). Computerized testing also enables students who finish sooner than others to move on to the next section instead of having to wait for everyone to finish or for time to expire. Computerized testing can also offer a variety of choices for timing a test. Tests can be self-paced (not timed), sections can be timed, and even each item can be timed. Using this self-paced style of testing, along with administering tests that are not timed, allows students who need additional time to feel less like they are racing against the clock, which can help reduce test anxiety. Computerized testing also allows for perceptual and psychomotor skills testing, which can be impossible to assess with a paper-and-pencil test (Grist et al., 1989).

One aspect of testing error, test administrator differences, is eliminated because all students receive the exact same instructions from the computer as opposed to different interpretations depending on who is delivering the instructions (Grist et al., 1989). Computerized tests also improve the accuracy of another type of testing error because the questions are asked one at a time, which negates “the possibility of filling out an answer sheet incorrectly” (Olson, 2001, as cited in McHenry et al., 2004, ¶ 6). Test security can be increased because hard copies of the testing materials cannot be compromised (Grist et al., 1989). However, there are still security issues with the advent of camera phones, mini digital cameras, and advances in electronics technology.

While any type of computerized testing has the above-mentioned advantages, computer-adaptive testing has advantages specific to it. Computer-adaptive testing can increase the efficiency of testing since less time is needed to take a computer-adaptive test, as fewer test items are needed to provide an accurate assessment of student achievement. In some cases, computer-adaptive testing can reduce testing time by more than half and yet provide the same level of test reliability. Reducing the time it takes to finish a test can lessen one factor, which is fatigue. Computer-adaptive testing can also more accurately assess students with a wide range of abilities by including in the test bank easier and more difficult questions (Grist et al., 1989). Computer-adaptive testing also saves instructors time because the marking and scoring are completed by the computer; a task that used to take hours (Latu & Chapman, 2002).

Disadvantages of Computerized Testing. The use of computerized testing can be negligible if there are not many computers at a school. Computer-adaptive testing requires a large number of students to be able to achieve precise estimates of test items. Therefore, most local school districts and schools would be unable to develop their own computer-adaptive tests and need to go with nationally developed ones (Grist et al., 1989). Some critics feel the higher cost of computerized testing could unfairly discriminate against poor students and poor school districts that may not have the funds to supply students with computer access (Greenberg, 1998). Student access to computers in school has grown with the increase of computerized testing. Currently, there is one computer for every 3.8 students, which is an increase from 1999 when there was approximately one computer for every 5.7 students (Cavanagh, 2007). The predominant disadvantages of computer-assisted testing tend to wane as the technology gets less expensive to implement and most students are growing up with computer and Internet access.

Computerized testing can be scheduled more easily than paper-and-pencil tests (Bugbee, 1992; Hambleton, Zaal, & Pieters, 1991; Wise & Plake, 1989, as cited in Bugbee, 1996), but they may not be equivalent to taking a paper-and-pencil test. Research has shown that identical tests given on the computer and in a paper-and-pencil format do not always produce the same results. Some research shows no difference, other research shows that students perform better on paper-and-pencil tests, and yet other research shows that students perform better on computer-based tests (Bugbee, 1996). Two surveys studying the difference in college admissions examinations between paper-and-pencil and computerized tests found the scores to be comparable (Greenberg, 1998). Paper-and-pencil tests are considered by some to waste students' time because there will always be questions that are either too difficult or too easy based on students' ability levels. They are also unable to properly gauge each student's current ability level (Grist et al., 1989).

For students taking a computerized test, the advantages may not be as obvious. Many students like the idea that they can see their results instantly and it is easier to schedule the tests (Bugbee, 1992; Bugbee & Bernt, 1990, as cited in Bugbee, 1996). However, there are some students who simply do not like computers (Olsen & Krendl, 1990, as cited in Bugbee, 1996). For students who are not very familiar with or comfortable using a computer, test anxiety issues can become greater when faced with computerized testing. Additionally, students who have manual dexterity issues may find it difficult to use the computer, which can also cause anxiety (Latu & Chapman, 2002). While computerized testing can allow students to mark questions they are unsure of or leave questions blank to go back to later, computer-adaptive testing cannot allow students to return to previous questions to review or change an answer because it is scoring simultaneously and judging student skill levels and the next item difficulty level (Greenberg, 1998).

There can also be computer issues and other obstacles. Students can be in the middle of a test when the server goes down and therefore be unable to continue with the test. For example, elementary schools bus their students to the local high school so students can use the computers to test only to discover that the tables are too high and students cannot reach the keyboards or the computer lab is set up to accommodate 50 students but only 40 of the computers are working when they arrive. Some of these issues can be mitigated if communication between schools and test administrators are open and frequent and if proper planning and equipment testing are completed to avoid potential problems (McHenry et al., 2004). It is also a good idea to have a back-up plan in place just in case a worst-case scenario occurs.

Conclusion

For computerized testing to be effective, test developers must make sure that the instructions for the tests are specific for testing on a computer. Those administering computerized tests need to make sure that students are comfortable using a computer and adept with how to take a test on a computer before they take the first assessment. Test administrators also need to be aware of how to troubleshoot the hardware and software so they can take care of any computer-related glitches that may occur during test time (Bugbee, 1996). If not, then they should make sure that there is someone on call who can troubleshoot if necessary. There are also practical issues in dealing with the data. Since there is a wealth of student data available, it should be predetermined what personnel can have access to the system and student records. There is always some risk involved that the server may go down or the data becomes corrupted (McHenry, et al., 2004). Data should always be backed up and saved in a separate location. Test results could also be printed out, but this can take a lot of paper and would be the least desirable option because the data would need to be keyed back in if it were lost.

Terms & Concepts

Adequate Yearly Progress: Adequate yearly progress, for No Child Left Behind Act purposes, means that test data must be collected and analyzed in relation to student learning to report student and school proficiency, and the standards that determine proficiency must be raised over time with an increased number of students meeting the standards.

High-Stakes Testing: High-stakes testing is the use of test scores to make decisions that have important consequences for individuals, schools, school districts, and/or states and can include high school graduation, promotion to the next grade, resource allocation, and instructor retention.

Latent Trait: A latent trait is the trait or ability that is assumed to be evaluated and calculated by the exam items. Latent trait models can only be tested indirectly, as the latent abilities are not observed outright.

No Child Left Behind Act of 2001 (NCLB): The No Child Left Behind Act of 2001 is the latest reauthorization and a major overhaul of the Elementary and Secondary Education Act of 1965, the major federal law regarding K-12 education.

Standardized Testing: Standardized testing is the use of a test that is administered and scored in a uniform manner, and the tests are designed in such a way that the questions and interpretations are consistent.

Bibliography

Bugbee, A., Jr. (1996). The equivalence of paper-and-pencil and computer-based testing. Journal of Research on Computing in Education, 28 , 282. Retrieved November 19, 2007 from EBSCO online database Academic Search Premier http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=9605221096&site=ehost-live

Cavanagh, S. (2007). On writing tests, computers slowly making mark. Education Week, 26 , 10. Retrieved November 19, 2007 from EBSCO online database Academic Search Premier http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=24097039&site=ehost-live

Davis, Michelle R. (2012, October 15). Adaptive testing evolves to assess common-core skills. Edweek. Retrieved December 21, 2013, from http://www.edweek.org/dd/articles/2012/10/17/01adaptive.h06.html

Eyal, L. (2012). Digital assessment literacy—the core role of the teacher in a digital environment. Journal of Educational Technology & Society, 15, 37–49. Retrieved December 13, 2013, from EBSCO online database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=76559253

Gierl, M. J., Lai, H., & Li, J. (2013). Identifying differential item functioning in multi-stage computer adaptive testing. Educational Research & Evaluation, 19IO(2/3), 188–203. Retrieved December 20, 2013, from EBSCO online database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=86179089

Greenberg, R. (1998). Online testing. Techniques: Making education & career connections, 73 , 26. Retrieved November 19, 2007 from EBSCO online database Academic Search Premier http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=716969&site=ehost-live

Grist, S. et al. (1989). Computerized adaptive tests. Retrieved November 19, 2007 from Education Resources Information Center http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/1f/e0/96.pdf

Hambleton, R. (1989). Item response theory: Introduction and bibliography. Retrieved July 30, 2007 from Education Resources Information Center http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/1f/4f/29.pdf

Latu, E., & Chapman, E. (2002). Computerised adaptive testing. British Journal of Educational Technology, 33 , 619-622. Retrieved November 19, 2007 from EBSCO online database Academic Search Premier http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=7717302&site=ehost-live

McHenry, B., Griffith, L., & McHenry, J. (2004). The potential, pitfalls and promise of computerized testing. T H E Journal, 31 , 28-31. Retrieved November 19, 2007 from EBSCO online database Academic Search Premier http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=14088646&site=ehost-live

Sari, H. İ., & Huggins-Manley, A. C. (2017). Examining content control in adaptive tests: Computerized adaptive testing vs. computerized adaptive multistage testing. Educational Sciences: Theory & Practice, 17(5), 1759–1781. doi:10.12738/estp.2017.5.0484. Retrieved January 26, 2018, from EBSCO online database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=125250287&site=ehost-live&scope=site

Tanveer, A., Azeem, M., Maqbool, S., & Tahirkheli, S. (2011). Attitude of mathematics teachers related to the use of computer technology in the classroom. International Journal of Learning, 18, 279–290. Retrieved December 20, 2013, from EBSCO online database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=73027741

Suggested Reading

Bartram, D., & Hambleton, R. (2006). Computer-based testing and the Internet: Issues and advances. Hoboken, NJ: Wiley.

Fengyang, S. (2012). Exploring students' anxiety in computer-based oral English test. Journal of Language Teaching & Research, 3, 446–451. Retrieved December 20, 2013, from EBSCO online database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=90356278

Hambleton, R., Zaal, J., & Pieters, J. (2000). Computerized adaptive testing: Theory, applications, and standards. Reston, MA: Kluwer.

Rossano, V., Pesare, E., & Roselli, T. (2017). Are computer adaptive tests suitable for assessment in MOOCs? Journal of E-Learning & Knowledge Society, 13(3), 71–81. doi:10.20368/1971-8829/1393. Retrieved January 26, 2018, from EBSCO online database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=126533883&site=ehost-live&scope=site

Wainer, H., Dorans, N., Flaugher, R., & Green, B. (2000). Computerized adaptive testing: A primer. Florence, KY: Lawrence Erlbaum Associates, Inc.

Yovanoff, P., Squires, J., & McManus, S. (2013). Adaptation from paper-pencil to web-based administration of a parent-completed developmental questionnaire for young children. Infants & Young Children: An Interdisciplinary Journal Of Special Care Practices, 26, 318–332. Retrieved December 20, 2013, from EBSCO online database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=90361842

Essay by Sandra Myers, M.Ed.

Sandra Myers has a master's degree in adult education from Marshall University and is the former director of academic and institutional support at Miles Community College in Miles City, Montana, where she oversaw the college's community service, developmental education, and academic support programs. She has taught business, mathematics, and computer courses; and her other areas of interest include adult education and community education.