Race, Gender and Testing

Because high-stakes testing has taken on such a major role in education in the early twenty-first century, it is important that tests provide all students an equal opportunity to show their academic skills and knowledge. Race and gender can sometimes factor into a student's opportunity to demonstrate his or her abilities. Male and female students tend to thrive in different classroom environments, and their cognitive abilities tend to develop at different rates. An instructor's gender can also influence his or her teaching style and impact students. Tests can contain subtle biases that favor one racial, cultural, or gender group over another.

Keywords Adequate Yearly Progress; Advanced Placement; Cultural Bias; Gender Bias; Gender Gap; High-Stakes Tests; Language Bias; No Child Left Behind Act of 2001 (NCLB); Norm-Referenced Test; Racial Bias; Standardized Tests; Test Anxiety; Test Bias

Testing & Evaluation > Race, Gender & Testing

Overview

High-stakes testing has taken on such an imperative role in education, and it is important that tests offer all students an equal opportunity to show their academic skills and knowledge. Race and gender can sometimes factor into a student's opportunity to demonstrate their abilities.

As learners, students tend to be different based on their gender. Studies have shown that boys tend to learn more when there is competition (such as grading) involved; and girls tend to learn more in a cooperative learning environment (Marshall & Reinhartz, 1997, as cited in Gool, Carpenter, Davies, Ligos, MacKenzie, Schilp, & Schips, 2006). The cognitive abilities of young males develop later than those of females, so boys usually do not do as well in language arts in their earlier years as girls. However, young males tend to do well in the areas of mathematics and science (Frawley, 2005, as cited in Gool et al., 2006). Of course, there are always exceptions, but when instructors and parents know about the results of studies such as these, it is possible that it affects the way they interact with their students and children. For example, not expecting girls to do well in mathematics or the sciences and possibly, subconsciously, steering them away from careers and classes that involve these skills.

How Teacher Gender Affects Learning

Studies have also shown proof of a difference between male and female instructors and the teaching styles they employ. Male instructors are often more direct, dominant, and authoritative and lean toward a lecture style of instruction. Female instructors are often more nurturing, provide cooperative learning opportunities, and ask questions to guide student learning (Marshall & Reinhartz, 1997, as cited in Gool et al., 2006). These differences in teaching styles and gender can affect the way instructors interact with their students. For example, it is typical for instructors to interact more with their male students than their female students, even if those interactions are related to disciplinary matters (Gool et al., 2006). Studies have shown that instructors tend to be more lenient toward boys. Instructors have been shown to be more likely to accept an answer that is called out from boys but to remind girls that there are rules (Hulley, 2001, as cited in Gool et al., 2006). Instructors tend to call on boys more than girls, give them more time to answer the questions, and give them more feedback than they do girls. Conversely, instructors tend to help girls more when they ask for assistance instead of guiding them and encouraging them to find the answer on their own (Sadker & Zittleman, 2005; Gool et al., 2006). Studies such as these show that while the more obvious forms of gender bias have been addressed, there are still attitudes and unintended behaviors that should be taken into consideration and rectified to assure that all students are being treated equally and are receiving an equal education.

With more attention being placed on such discrepant behaviors, there has been a real push in the U.S. to address the gender inequities that still seem to be so prevalent. However, despite trying to address such bias issues, boys still continue to outscore girls on most high-stakes tests, including both the verbal and mathematics sections of the SAT (Sadker & Zittleman, 2005; Gool et al., 2006). If nothing else, this shows that more research needs to be undertaken to try to determine what the cognitive differences are between boys and girls and how they can be mitigated.

Racial & Cultural Test Bias

Cultural bias in testing can come in many different forms. Some common examples include when the content of a test uses references or language that favors one racial group over others, or when the people or historical figures referenced in the testing instrument are all or mostly of one race. Test developers must attempt to have equal representation of all races, use famous historic figures of various races, and not use role stereotypes, such as Native American warrior and white, male business executive (Nitko, 1983, as cited in Zurcher, 1998). Tests can also be racially biased if the language that is used is not familiar to a subgroup. While the common assumption is that language bias favors white, middle class students, it is possible to develop a test that is biased against any subgroup. For example, a test was developed containing 100 vocabulary words that were thought to be a better predictor of learning ability for African American students than students in other subgroups of the population. African American students did, indeed, perform better on the test than white students did (Williams, 1975, as cited in Zurcher, 1998).

Inappropriate Standardization Samples

With more attention being given to racial bias in testing during the past three decades, the type of biases noted above are not as prevalent as they used to be. However, there is still one type of racial bias that can still cause difficulty for test developers: inappropriate standardization samples. Inappropriate standardization samples occur when the norm groups are not racially diverse or do not include enough of each subgroup to be similar to their percentage in the population. Therefore, the assessment's results may not reflect these students' abilities and achievement levels (Overton, 1996, as cited in Zurcher, 1998). This is more difficult to address, because even if the norm groups have an appropriate percentage to match the overall population of each subgroup, these groups can still be biased against because the majority will always be over represented simply because they are the majority.

Closing the Gender Gap by Subject

The Educational Testing Service conducted a review “of gender differences in elementary and secondary education within racial and ethnic groups” which showed that the gender gap varied only slightly in certain instances (Coley, 2001, as cited in Gender Differences in Educational Achievement, 2001, ¶ 3):

• In grades four, eight, and twelve, girls scored higher than boys across all racial and ethnic groups in both reading in writing.

• In mathematics at grade four, white males scored higher than white females, but there were no gender differences within other racial groups. At grades eight and twelve there were no gender gaps for any group.

• In science at age nine, there were no differences within any racial group. At age thirteen, white males scored higher than white females, but there were no gender differences within the other racial groups. At age seventeen, white and Hispanic males scored higher than white and Hispanic females, but there were no gender differences for African American and Asian/Pacific Islanders ("Gender Differences in Educational Achievement," 2001, ¶ 4-6).

The gender gap is also closing for students taking mathematics courses with all races, except Hispanics, where boys still take more mathematics classes than girls. While girls are now even with boys in mathematics courses, there are still more boys than girls taking four years of science. Based on 1999 information, racial participation in the Advanced Placement Program has increased tremendously, with the number of Hispanic females showing the greatest increase, of 308 percent. Females in general are participating in the program, which allows students to enroll in high school classes that are able to earn college credits, at greater rates than males. However, males comprised a vast majority in computer science and physics courses. The gender differences across races varied depending on the subject. “There was little difference in English literature and composition exam scores, but there was considerable difference in biology and calculus, with males scoring higher. The greatest gender gap was for Hispanics in biology” (Gender Differences in Educational Achievement, 2001, ¶ 11).

Applications

Gender Bias in Tests

A test is considered gender biased if male students and female students with the same ability obtain different scores on the same testing instrument. Many factors can figure into testing errors, including the conditions under which a test is administered, how each test item is worded, and students' attitudes toward the test. However, these factors can randomly affect both males and females. Systematic error, which is the result of characteristics of students that do not change — such as their gender or race — that also end up being inadvertently measured, is a type of testing error that has received much attention. While test developers do their best to ensure that tests are not biased, instructors should still examine a test's questions for possible bias. This can be done by checking the material over to see if there are any references that may be offensive to members of one gender, looking for references to objects and ideas that are likely to be more familiar to males or to females, and determining whether one sex is featured more frequently in the questions or whether each sex appears to be only in stereotypical roles, such as male auto mechanics and female office assistants (Childs, 1990).

Aptitude tests are intended to predict student success in the future. College entrance exams are a type of aptitude test. The SAT (Scholastic Assessment Test)is another type of aptitude test that students take in high school. These types of aptitude tests have shown that females tend to earn lower scores than their male counterparts, but females tend to have higher grade point averages during their first year of college, which could mean that these tests are biased against females (Childs, 1990).

Racial Bias in Tests

Stereotyping and inadequate representation of minorities is another form of test bias. While this type of bias may not make the test item any more difficult for the test taker, it may cause an emotional reaction, which can affect students and prevent them from doing their best on the test. Examples of this type of bias include:

• Test items that refer to high suicide and alcoholism rates among Native Americans;

• Implying that a certain subgroup is inferior in any way; and

• Using terms that are now considered unacceptable, such as housewife, Chinaman, colored people, red man, and lower class.

Other terms that should be avoided include those job designations that end in 'man.' Therefore, instead of using 'policeman,' test item writers should use 'police officer;' and instead of using 'fireman,' they should use the term 'fire fighter.' Depicting members of designated subgroups of interest as having stereotypical occupations, such as Chinese launderer, should also be avoided (Hambleton & Rodgers, 1995).

Cultural Bias in Psychological Tests

Detractors of psychological tests due to cultural test bias contend that all ethnic or racial group differences on psychological tests are due to inherent biases embedded in the tests through flawed psychometric methodology and that group differences are because of characteristics of the tests and unrelated to differences in the psychological trait in question. Bias in psychological testing can be life changing because it could mean students are inaccurately placed into remedial programs, are denied entrance into their college of choice, are denied admission into graduate school, or are denied employment. There are many contradictory statements regarding cultural bias. Those who believe the tests are inherently biased believe that the content of the test is unfamiliar to and inappropriate for minority students; the standardization representative sampling of tests include an insufficient number of minorities for them to make a significant difference on test item selection; examiner and language bias is present since most psychologists are Caucasian and only speak standard English, which can be intimidating and confusing for minority students; the tests measure different attributes when used with minority students; and the tests do not predict outcomes or future behaviors for minority children because they are not valid (Reynolds, 1983).

Bias in Norm-Referenced Tests

Studies have shown that norm-referenced tests are not culturally biased against African American students, but they have also shown that lower scores for minorities are more closely related to their economic status (Roberts & DeBlassie, 1983, as cited in Castenell & Castenell, 1988). For example, low-income African American students spend less time than middle-income African American and white students on the common curriculum that most standardized, norm-referenced tests are based on (Spring, 1982, as cited in Castenell & Castenell, 1988). Therefore, if they are underrepresented in the norm groups, it can be impossible to detect and eliminate any test items that may be biased against them (Castenell & Castenell, 1988).

Further Insights

Schools, districts, and states all need to be aware of the ramifications of using a testing instrument to determine a student's fate, as there can be legal issues and lawsuits. For example, the New York State Education Department was sued for discriminating against girls for using SAT scores as the only criteria for awarding their state merit scholarships. The basis of the suit was that although girls tended to have higher grades than boys competing for the scholarships, they had lower scores on the SAT and ended up receiving fewer scholarships. Since the SAT is intended to predict future success and is not a measurement of high school achievement, which was the intent of the scholarship program, the parents who brought suit against the state won; New York could no longer use SAT scores as the sole basis for awarding their scholarships (Childs, 1990).

In prior years, courts have tended to rule that scores from standardized testing are culturally biased. In one case in California from 1970, it was contended that Mexican American and Chinese American students were placed in special education programs because the standardized test was biased against them. This resulted in having an unbiased test developed and used to retest and reevaluate all Mexican American and Chinese American students who were placed in special education (Zurcher, 1998).

In a 1979 case in California, African American children were classified as "educable mentally retarded" and placed in special classrooms. The suit stated that they were inappropriately classified and that the misplacement resulted in great harm to the students' social, educational, and future economic status. They won their case, which resulted in schools being prohibited from administering any standardized intelligence test to African American children for the purpose of identifying educable mental retardation without prior approval from the court (Rothstein, 1995, as cited in Zurcher, 1998).

Even though courts may concede that test items are culturally biased against certain racial groups, they sometimes still rule that the number of questions that are biased are not great enough to affect the overall scores of those being discriminated against (Rosthstein, 1995, as cited in Zurcher, 1998). However, considering the fact that others contend that students who are confronted with tests that are biased against them may be offended by the questions and then may not give the test their full effort once confronted with the bias, even one biased question can certainly affect students' overall test score.

Test Anxiety

Test anxiety is something that affects girls more than boys. It can begin “as early as elementary school where studies show girls report being more worried than boys about their school performance” (Pomerantz, Altermatt, & Saxon, 2002, as cited in Altermatt & Kim, 2004, p. 8). This trend continues in high school and college (Feingold, 1994, as cited in Altermatt & Kim, 2004). Those students who do worry about academic performance do not do any better than those who do not worry as much, but they did show lower levels of academic confidence and were not sure how to be successful in school. Many studies have found that in students with similar levels of ability those “with high levels of test anxiety perform worse on cognitive tasks than students with low levels of test anxiety” (Eccles, Wigfield, & Schiefele, 1998, as cited in Altermatt & Kim, 2004).

The question of why girls worry more than boys has still not been definitively answered. Claude Steele and his colleagues present a Stereotype Threat Theory that there is a stereotype threat that can cause anxiety in girls. This is because girls are aware that boys are considered better at mathematics than they are, that girls may feel pressure to do well in testing environments when they are confronted with the fact that they probably will not, and that this stereotype can lead students to confirm the negative and not do well because their worries do affect their performance (cited in Altermatt & Kim, 2004). Research is ongoing, but one study did find “that females performed worse on a mathematics test when they took the test in a room where males outnumbered females than in a room with only females” (Inzlicht & Ben-Zeev, 2000, as cited in Altermatt & Kim, 2004, p. 9).

This theory can also hold true for minority students. One study showed that African American “students who were asked to report their race before taking a portion of a GRE (Graduate Record Examination) test did not do as well as students who were not asked to report their race” (Steele & Aronson, 1995, as cited in Altermatt & Kim, 2004, p. 9).

Viewpoints

When testing instruments are used as the basis to make decisions that can have far-reaching consequences, it is important to ensure that all forms of bias have been removed from the instrument. However, it is impossible to create the perfect representative group on which to test the bias of an assessment. Minority representation in any category — gender, ethnicity, religion, economic status, etc. — may still mean there is bias against the minority and bias for the majority simply because the minority is in the minority. Therefore, it is important that anyone involved in high-stakes testing consider using the chosen instrument in conjunction with other forms of assessment or criteria in order to mitigate any possible bias that may exist in the testing instrument.

A recent study has shown that by the third grade, 51 percent of boys and only 37 percent of girls have used a microscope in class. Boys are also five times more likely than girls to consider a technology-related career (Sadker & Zittleman, 2005). States, school districts, schools, and instructors need to be aware of these facts and make sure that everyone is doing all they can to make sure boys and girls are treated equally and that girls are encouraged and allowed to participate in class. Instructors need to look at themselves to see if they are guilty of biases, and schools and districts can have in-service time devoted to the issue. Principals should make time to observe their instructors in the classroom and bring any race or gender biases they may see to the instructor's attention. Colleges and universities can help by making sure gender and racial biases are addressed in their curriculum so that any new instructors they graduate can begin their careers aware of the issues.

With respect to gender and racial differences, studies show things have changed for the better. Students are taking more science and mathematics courses, performing better in these classes, and closing the educational gap. Instructors, school counselors, and parents play a vital role because they can encourage and guide students to take these classes — especially in the later grades. With heightened awareness, appropriate attention given to these issues, and the narrowing education gap, race and gender can soon be minimally intrusive issues in the classroom.

Terms & Concepts

Adequate Yearly Progress: Adequate yearly progress means that test data must be collected and analyzed in relation to student learning to report student and school proficiency, and the standards that determine proficiency must be raised over time with an increased number of students meeting the standards.

Advanced Placement: Advanced Placement classes are taken during high school as a way to prepare for challenging courses that will be taken in college. High school students may take Advanced Placement exams, and qualifying scores on such exams may result in college credits being granted at the discretion of the individual college or university.

Gender Bias: Gender bias in testing can occur if males and females with the same ability levels achieve different scores on the same testing instrument.

High-Stakes Tests: High-stakes tests are those whose test scores are used to make decisions that have important consequences for students, schools, school districts, and/or states and can include high school graduation, promotion to the next grade, resource allocation, and instructor retention.

Language Bias: Language bias in testing occurs when test items may favor or discriminate against a particular subgroup of the testing population based on the language used on the test. Language bias can also occur if the language used by the test administrator is different from the test taker's native language or type of language they use — such as African American vocabulary, standard English, English vernacular, etc.

No Child Left Behind Act of 2001 (NCLB): The No Child Left Behind Act of 2001 is the latest reauthorization and a major overhaul of the Elementary and Secondary Education Act of 1965, the major federal law regarding K–12 education.

Norm-Referenced Test: Norm-referenced tests are assessments administered to students to determine how well they perform in comparison to other students taking the same assessment.

Standardized Tests: Standardized tests are tests that are administered and scored in a uniform manner, and the tests are designed in such a way that the questions and interpretations are consistent.

Racial Bias: Racial bias in testing can occur if students of different races with the same ability levels achieve different scores on the same testing instrument.

Test Anxiety: Test anxiety is when students experience nervousness or apprehension before, during, or after an assessment to the degree that it leads to poor test performance and interferes with their learning.

Test Bias: Test bias occurs when provable and systematic differences in the results of students taking the test are discernable based on group membership, such as gender, socioeconomic standing, race, or ethnic group.

Bibliography

Altermatt, E., & Kim, M. (2004). Can anxiety explain sex differences in college entrance exam scores? Journal of College Admission, 183, 6-11. Retrieved September 6, 2007 from EBSCO Online Database Academic Search Complete. http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/29/e6/30.pdf

Castenell, L., Jr., & Castenell, M. (1988). Norm-referenced testing and low-income blacks. Journal of Counseling & Development, 67, 205. Retrieved September 6, 2007 from EBSCO Online Database Academic Search Premier. http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=4962098&site=ehost-live

Childs, R. (1990). Gender bias and fairness. (Report EDO-TM-90-9). Washington, DC: Office of Educational Research and Improvement. (ERIC Document Reproducation Service No. ED328610). Retrieved September 6, 2007 from EBSCO Online Education Research Database. http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/22/c5/e4.pdf

Fischer, F. T., Schult, J., & Hell, B. (2013). Sex-specific differential prediction of college admission tests: A meta-analysis. Journal of Educational Psychology, 105, 478-488. Retrieved December 13, 2013 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=87508966

Ford, D. Y., & Helms, J. E. (2012). Overview and introduction: Testing and assessing African Americans: "Unbiased" tests are still unfair. Journal of Negro Education, 81, 186-189. Retrieved December 13, 2013 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=83853077

Gender differences in educational achievement within racial and ethnic groups. (2001). (Report EDO-UD-01-3). Washington, DC: Office of Educational Research and Improvement. (ERIC Document Reproduction Service No. ED455341). Retrieved September 6, 2007 from EBSCO Online Education Research Database. http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/17/34/2d.pdf

Gool, J., Carpenter, J., Davies, S. Ligos, T., MacKenzie, L., Schilp, R., & Schips, J. (2006). Teacher bias of gender in the elementary classroom. Education Today, 5, 27-30. Retrieved September 6, 2007 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=22836345&site=ehost-live

Hambleton, R., & Rodgers, J. (1995). Item bias review. (Report EDO-TM-95-9). Washington, DC: ERIC Clearinghouse on Assessment and Evaluation. (ERIC Document Reproduction Service No. ED398241). Retrieved July 24, 2007 from EBSCO Online Education Research Database. http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/14/a5/8d.pdf

Kline, R. B. (2013). Assessing statistical aspects of test fairness with structural equation modelling. Educational Research & Evaluation, 19(2/3), 204-222. Retrieved December 13, 2013 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=86179090

Reynolds, C. (1983). Test bias: In God we trust; all others must have data. Journal of Special Education, 17, 241-258. Retrieved July 24, 2007 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=4727754&site=ehost-live

Sadker, D., & Zittleman, K. (2005). Gender bias lives, for both sexes. Education Digest, 70, 27-30. Retrieved September 6, 2007 from EBSCO Online Database Academic Search Premier. http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=16782584&site=ehost-live

Skiba, R., Knesting, K., & Bush, L. (2002). Culturally competent assessment: More than nonbiased tests. Journal of Child & Family Studies, 11, 61-78. Retrieved July 24, 2007 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=6768458&site=ehost-live

Zurcher, R. (1998). Issues and trends in culture-fair assessment. Intervention in School & Clinic, 34, 103. Retrieved September 6, 2007 from EBSCO Online Database Academic Search Premier. http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=1242315&site=ehost-live

Suggested Reading

Ford, D. (1996). Reversing underachievement among gifted black students: Promising practices and programs. New York: Teachers College Press.

Goldhaber, D., & Hansen, M. (2010). Race, gender, and teacher testing: How informative a tool is teacher licensure testing? American Educational Research Journal, 47, 218-251. Retrieved December 13, 2013 from EBSCO Online Database Education Research Complete. http://search.ebscohost.com/login.aspx?direct=true&db=ehh&AN=50447191

Klein, M., & Chen, D. (2000). Working with children from culturally diverse backgrounds. Clifton Park, NY: Thomson Delmar Learning.

Osterlind, S. (1983). Test item bias. Thousand Oaks, CA: Sage Publications.

Thernstrom, A., & Thernstrom, S. (2004). No excuses: Closing the racial gap in learning. New York: Simon & Schuster.

Essay by Sandra Myers, M.Ed.

Sandra Myers holds a master's degree in adult education from Marshall University and is the former director of academic and institutional support at Miles Community College in Miles City, Montana, where she oversaw the college's community service, developmental education, and academic support programs. She has taught business, mathematics, and computer courses; her other areas of interest include adult education and community education.