Assessment in education

Assessment in education encompasses a variety of processes aimed at testing, measuring, and evaluating student performance. It involves methods such as standardized tests, alternative assessments, and self-assessments, each serving different purposes. Standardized assessments, like the SAT, are designed to measure performance across a broad population under uniform conditions, allowing for comparative analysis among peers. In contrast, alternative assessments focus on student engagement and learning processes, while self-assessment encourages students to reflect on their own progress.

Different types of assessments exist to meet various educational needs, including placement assessments to determine skill levels, diagnostic assessments to identify strengths and weaknesses, formative assessments to monitor ongoing learning, and summative assessments that evaluate overall achievement at the end of a learning period. The effectiveness of these assessments relies heavily on their validity and reliability; validity ensures that the assessment measures what it claims to measure, while reliability refers to the consistency of results over time. Ultimately, a well-designed assessment informs teaching strategies and supports student growth, reflecting a comprehensive understanding of educational achievement.

Published in: 2024

By: Ouyang, Ronghua

Subject Terms

Assessment in education

TYPE OF PSYCHOLOGY: Psychological methodologies

"Assessment" is a general term for a broad range of processes for testing, measuring, and evaluating performance. Standardized, alternative, and self-assessment methods are used for the purposes of replacement, diagnosis of performance, and provision of formative and summative evaluation. The quality of an assessment depends on its validity and reliability.

Introduction

Every person has experienced some type of assessment. Those who have had public education in the United States are familiar with the SAT Reasoning Test (SAT) and the Iowa Test of Basic Skills (ITBS) administered in schools. Those whose native language is not English and have pursued education in the United States likely know about the Test of English as Foreign Language (TOEFL). Those who have applied to graduate schools are familiar with the Graduate Record Examination (GRE) or the Graduate Management Admission Test (GMAT). Assessment is used quite often, for different purposes, in daily life.

Assessment is a general term for a broad range of processes that includes testing, measuring, and evaluation. Testing is simply a particular part of assessment, usually a set of questions that participants must answer in a fixed time period and under certain conditions. Measuring is a process that assigns numbers to assessment results, such as the number of correct or incorrect answers to a project or performance. A rubric rating scale is usually created to record quantitative or qualitative . Evaluation is a process of assessment that emphasizes a value or a judgment to match with the correlated objectives of a project, an instruction, or a performance to see how well the project or performance is done.

Standard, Alternative, and Self-Assessment

There are different types of assessment, using specific tests, measurements, and ways of evaluation, and they are used for different purposes. Generally, the assessment can be classified as standardized assessment, alternative assessment, or self-assessment.

Standardized assessment adopts standardized tests to measure and evaluate a performance. Standardized tests are developed by a major test publisher for a large population and administered under the same conditions and time limits to all participants. The SAT Reasoning Test is a typical example. Standard assessment is carried out to see how the results are norm-referenced for interpretation, that is, to compare an individual’s performance with the performance of his or her peers. For example, a person’s SAT score can be ranked in percentile compared with others of the same age or grade. If a person’s percentile rank is 84, that means that 84 percent of all of the scores are lower than this individual’s score.

According to educator James H. McMillan, alternative assessment and self-assessment are weighted more toward assessment of the process than assessment of a product or a performance. Individually created tests, portfolios, exhibitions, journals, and other forms of assessment are commonly used. Alternative assessment is intended to engage an individual in the process of learning and thinking and in demonstrating during a performance. For example, a teacher who adopts a performance-based assessment will observe and make a judgment about the student’s demonstration of a skill or a competency in creating a product, constructing a response, or making a presentation. Self-assessment is a part of the learning process aimed at seeing where one is and how one is doing. Instead of relying on feedback from others, the person is expected to self-assess: to think about and change what he or she is doing while doing it. Self-assessment is also a reflective practice to bring past events to a conscious level and to devise appropriate ways to think, feel, and behave in the future, through techniques such as an annual review portfolio or a self-checklist.

Other Assessments

According to educator Donald Orlich, assessment can also be classified by the use of tests: placement assessment, diagnostic assessment, formative assessment, and summative assessment. Either standardized tests or self-made tests can be adopted in the process of these assessments.

Placement assessment determines whether an individual has the required knowledge and skills to begin a new position. In education, placement-assessment instruments are those pretests to see whether a student can be accepted or placed into a certain grade for instruction. Spontaneous, informal observations and interviews are also usually adopted in the placement assessment.

Diagnostic assessment tends to identify an individual’s strengths and weaknesses. For example, the Kaufman Assessment Battery for Children (K-ABC) and Woodcock-Johnson Psychoeducational Battery-Revised (WJ-R) are two specific diagnostic assessment instruments. K-ABC is used to diagnose the learning potential and learning styles of children between 2.5 and 12.5 years old. WJ-R is used to assess the intellectual and academic development of individuals from preschool to adulthood.

Formative assessment monitors a person’s learning or working progress to provide feedback to enrich knowledge and skills. It is believed that formative assessments and feedback can play an important role in supporting a performance. The portfolio is one of the commonly adopted formative-assessment instruments in education today. The data that are collected as part of the process of evaluating students’ learning progress are used to help teachers provide feedback to students, make changes in teaching strategies, and inform parents of specific needs from family for students’ success.

Summative assessment assesses final results, achievements, or projects for decision making. It occurs at the conclusion of instruction, such as at the end of a teaching unit or of an academic year. Instead of establishing students’ proficiency in knowledge and skills, it provides an overview of achievement across the knowledge base and skills. Term papers, chapter achievement tests, final exams, and research projects are often adopted for a summative assessment in schools.

Validity and Reliability

The quality of an assessment depends on its validity and its reliability. Validity refers to the appropriateness of the inferences, uses, and consequences that result from the assessment. It is the degree to which a test measures what it is supposed to measure. A specific test may be valid for a particular purpose and for a particular group. Therefore, the question is not whether a test is valid or invalid, but rather what it is valid for and for which group. However, it is important that a test is valid to measure or evaluate a typical situation or typical group of students. John Salvia and James E. Ysseldyke classify validity as content validity, criterion-related validity, or construct validity. Content validity is the degree to which a test’s items actually represent the contents to be measured. Test items cannot measure each or every content area, but it is expected that the test items will adequately sample the content area. If a test does not measure what students are supposed to learn, the test score will not reflect a student’s achievement. Criterion-related validity is the degree to which an individual’s performance can be estimated on the assessment procedure being validated. Concurrent criterion-related validity and predictive criterion-related validity are commonly described. Concurrent criterion-related validity refers to how accurately a test score is related to the scores on another test administered or to some other valid criterion available at the same time. Predictive criterion-related validity refers to how accurately a test score can predict how well an individual will do in the future. Thus, since the validity of a test is related to a criterion, the criterion itself, either for the test or for its prediction for the future, must be valid.

Construct validity is the degree to which a test measures an intended theoretical construct or characteristic. The construct is “invented” to explain behavior; it cannot be seen, but its effect can be observed. For example, it is hypothesized that there is something called intelligence that is related to learning achievement; therefore, the higher the intelligence one has, the better the learning achievement one will make. A test is developed to measure how much intelligence an individual has. If the individual’s test score and learning achievement were high, it would be evidence to support the construct validity of the test. However, if the higher score on the test did not indicate a higher learning achievement, it would not necessarily mean that the test did not measure learning achievement; the hypotheses related to the learning achievement of a high-intelligence individual might be incorrect.

Reliability refers to the dependability, trustworthiness, or consistency of the test results. It is the degree to which a test consistently measures whatever it measures. If a test is not reliable, then the results of the test will be expected to be different every time the test is administered.

There are three types of reliability: internal-consistency reliability, test-retest reliability, and inter-scorer reliability. When an individual is given a test with two similar but different groups of questions, the results from the two parts should be the same. It means the two groups of tests are internal-consistency reliable. The results from one part of the test items can be generalized to the other group of test items. If a test is administered at two different times, such as in a test and retest procedure, the results of the tests administered at different times should be quite stable. That means the results of a test and a retest are highly correlated. In this case, one test result can be generalized to the same test administered at a different time, for example, a week later. Inter-scorer reliability indicates that when two or more different scorers score a test, the results judged by different scorers are almost the same and highly correlated. With high inter-scorer reliability, one individual scorer’s judgment can be generalized to different scorers.

An interesting relationship exists between validity and reliability. A valid assessment is always reliable because when an assessment measures what it is supposed to measure, it will be reliable every time the assessment is administered. However, a reliable assessment is not necessarily valid, since an assessment may consistently measure the wrong thing.

Bibliography

Bates, John A., and Brian A. Lanza. "Conducting Psychology Student Research via the Mechanical Turk Crowdsourcing Service." North American Journal of Psychology, vol. 15, no. 2, 2013, pp. 385–94.

Brookhart, Susan M. "Educational Assessment Knowledge and Skills for Teachers Revisited." Education Sciences, vol. 14, no. 7, 2024, p. 751, doi.org/10.3390/educsci14070751. Accessed 26 Aug. 2024.

Maroemau, C. “Self-Assessment at Work: Outcomes of Adult Learners’ Reflections on Practice.” Research Methods 01/02. Ed. Mary Renck Jalongo, Gail Gerlach, and Wenfan Yan. McGraw-Hill, 2001.

McMillan, James H. Classroom Assessment: Principles and Practice for Effective Instruction. 7th ed. Pearson, 2017.

Orlich, D., R. Harder, and R. Callahan. Teaching Strategies. Houghton, 2006.

Ruiz-Primo, M. A., S. E. Schultz, and M. Li. “Comparison of the Reliability and Validity of Scores from Two Concept-Mapping Techniques.” Journal of Research in Science Teaching, vol. 38, no. 2, 2001, pp. 260–78.

Salvia, John, and James E. Ysseldyke. Assessment in Special and Inclusive Education. 13th ed. Cengage Learning, 2016.

Schmitt, Neal. "Research in Consulting Psychology Journal: Practice and Research: Reactions and Suggestions." Consulting Psychology Journal: Practice & Research, vol. 65, no. 4, 2013, pp. 278–83.

Zeliff, N. D. “Alternative Assessment.” National Business Education Yearbook, National Business Education Association, 2000, pp. 91–102.

Assessment in education

Related Topics

On this Page

Subject Terms

Assessment in education

Introduction

Standard, Alternative, and Self-Assessment

Other Assessments

Validity and Reliability

Bibliography