Thursday, September 10, 2015

Are High Stakes Assessments Testing Students or Our Patience?

No matter how well a high stakes test is designed, it is never going to fully encompass a student’s knowledge of any given subject matter. Rather, test items reflect only a sample of any given domain (Price & Koretz, 2013, p. 36). Of course, test-makers strive to construct test items to match what students should be learning. However, this is extremely challenging—especially when assessing fluid and cognitively complex skills such as reading comprehension. To even attempt to test such as comprehension, test-makers may resort to a watered-down multiple-choice format or take out more cognitively complex items. In their 2011 report, the National Reading Council disclosed, “It is generally more difficult to design items at higher levels of cognitive complexity and to have such items survive pilot testing” (p. 3-2). Regardless of the test items that are eventually placed on a test, it is naïve to assume that a two to three hour test can accurately gauge what a student has learned over the course of an entire year. It is for this reason that Price and Koretz (2013) warn, “Significant decisions about a student should not be made on the basis of a single score” (p. 42). And yet they are. The Miami-Dade County Public Schools (MDCPS) places students into intervention classes based on their performance on high stakes tests (MDCPS, 2012, p. 2). For example, if a student’s reading score is below the “proficient” level, he or she will be enrolled in an intensive reading class. What this means is that this student must forgo an elective such as art or music to receive an extra period of instruction in reading. The rationale is that this extra course will help struggling students get back on grade level. I will let my readers decide how effective it is to put all struggling readers together in one class that only they—the struggling readers—have to take while their more successful colleagues get to enjoy physical education or art. Perhaps such classes can be successful. However, intervention classes for the 2015-2016 are suspect for an additional reason. Since the scores from last year’s high stakes test (the FSA) were not available, MDCPS students in grades 6-8 have been placed into intervention classes based on test scores from the 2013-2014 school year. That means that current eighth graders are now suffering through an extra reading class because of how they performed on a test they took in sixth grade. Students are punished on the basis of one test that they took years ago despite experts’ warnings that single test scores should not be trusted in isolation.

Students are not the only ones to be punished according to this new system. Teachers are also feeling the pain of a system that makes decision based on single test scores. According to the evaluation system used by MDCPS, 35% of a teacher’s final evaluation for the year must be based on high stakes testing data (MDCPS, 2015, p. 40). However, test data may not be an accurate reflection of a teacher’s worth. As any teacher knows, some groups of students are significantly stronger than other groups. And some years, students are significantly weaker. Though teachers endeavor to address the needs of all students, it is unrealistic to assume that a teacher can take a class full of below grade-level students and, by the end of the year, miraculously transform them—all of them—into performing at grade level. However, the way the system currently works, a teacher is essentially punished for having a lower-performing group of students. Price and Koretz write, “The cohort of students in any one year is often very different from those in previous years, and these differences in student cohorts cause scores to fluctuate substantially from one year to the next, even if the effectiveness of the school remains unchanged” (pp. 42-43). Price and Kortez emphasize, “This inconsistency tends to be particularly large when the performance of classrooms or small schools is described” (p. 42). This situation is exacerbated when the tests being used do not look at progress within performance categories. In explaining standards-referenced tests, Price and Koretz note that “a student who improved from near the bottom of the ‘needs improvement’ range—a very large difference in many cases—would show no improvement” while “a student who progressed a very small amount but crossed a performance standard would be shown as having improved” (pp. 46-37). That doesn’t make much sense. And yet, performance assessed in precisely this way can impact a teacher’s evaluation, which can, in turn, affect a teacher’s future job prospects and even salary. These are monumental consequence for results that are statistically suspect.

Because so much is at stake for both students and teachers when it comes to high stakes tests, teachers inevitably respond the way any rational being placed in the same situation would: they research the test and change their instructional practices so that their students will get the highest test scores they possibly can. This is called “gaming the system” and is one major reason researchers say that test data should be interpreted with caution. When teachers teach to the test in this way, they are not actually expanding students’ knowledge or skill in the subject area. Rather, they are merely helping students do better on the particular test in question. The truth of this statement can be seen in the fact that “typically gains on high-stakes tests have been three to five times as large as gains on other tests with low (or lower) stakes. In numerous cases, large gains on high-stakes tests have been accompanied by no gains whatever on lower-stakes tests” (Price & Koretz, 2013 p. 61). A case in point can be seen by exploring data from the National Assessment of Educational Progress (NAEP). Although this data is collected every two years across the United States, the results do not directly impact individual students or teachers in the same way that state-mandated high stakes tests do. Hence, the NAEP is a low stakes test. Although the NAEP website reveals that there have been gains in reading since 1971, these gains are quite small: nine-year-olds improved 13 points and thirteen-year-olds improved 8 points. Of note, the reading performance of seventeen-year-olds did not improve at all (NAEP, 2015). It would seem that NCLB has done little to improve student performance. If students were getting left behind before NCLB, they are still getting left behind now. Internationally, the United States ranks 17 among the 34 member nations (PISA, 2013). This is completely average. As a world power, the United States is not content with average; we want to see our students’ scores rise. However, the data reveals that “there has been no significant change in these performances over time” (PISA, 2013). Shanghai can rest easy knowing that it is in no real danger of being overtaken by the US anytime soon. Again, NCLB seems to have failed in helping spur any real change or improvement.

Perhaps no one should really be surprised since “there is no research that links increased testing with increased reading achievement” (Afflerbach, 2004, p. 6). Let me repeat that astounding sentence written as part of the 2004 National Reading Council Policy Brief: “There is no research that links increased testing with increased reading achievement.” And yet testing continues unabated. More disturbingly, time spent testing may mean less time spent actually engaging in activities that would benefit students.

Part of the problem is that high stakes test scores are difficult for teachers to interpret and use when crafting instruction. They may also seem far removed from the classroom. As such, “Rather than give up what they consider to be valid, instructionally useful assessment practices such as running records of students’ oral reading, most teachers have continued with their own procedures while “adding on” what is externally imposed. The resulting redundancy is staggering” (Invernizzi, Landrum, Howell, & Warley, 2005, p. 617). Even if teachers understood the results of tests perfectly, there is always a disturbing delay between the point in time when a student takes a test and the point in time in which test scores are reported back. Test scores from the FSA Reading test taken last spring are still forthcoming.

One idea put forth by Caldwell (2008) is that classroom tests—which can be created by teachers themselves—take the place of high stakes tests. However, Caldwell is not proposing that NCLB and its high stakes tests merely disappear. Rather, to fill the void and offer a similar level of accountability, classroom assessments would have to be constructed with validity and reliability in mind. That is, they would have to be conscientiously constructed to test what is actually taught in class, what teachers really want students to learn. Doing so could result in a much richer picture of student performance, as individual teachers could use assessment formats well beyond the multiple-choice or short answer usually seen on a high stakes test. Teachers might use such rich sources of information as, for example, portfolios in combination with projects, essays, speeches, and more traditional quizzes or tests. However, although there is a great potential in using classroom assessments for everyone’s benefit, these assessments would have to be created and used by all teachers so that performance across teachers and grades could be determined. If everyone designed their own test, then no one could ever really compare one student’s score to another. After all, one teacher’s tests may be significantly more difficult from that of the teacher down the hall. Moreover, classroom assessments should not be clouded by such factors as participation or effort. But currently, many if not all teachers factor behavior and effort into a student’s final grade. If actual achievement is truly what we want to track, then for better or for worse, participation and effort are beside the point.

Assessments designed by teachers themselves might prove to be part of the solution. However, it is important to keep in mind the overarching goals of education and ponder the extent to which a test—any test—can assess students. The NCR Report reminds us that “less tangible characteristics—such as curiosity, persistence, collaboration, or socialization—are not tested. Nor are subsequent achievements, such as success in work, civic, or personal life, which are examples of the long-term outcomes that education aims to improve” (National Research Council of the National Academies, 2011, p. 3-1). However, what more could we ask of the educational system than that it help our students meet these “intangible goals”? If the goal of education is to prepare students to be successful and productive citizens, high stakes tests may have failed. Perhaps it is time to intervene and begin a new era of classroom-based assessment that is designed and approved by those who work in classrooms on a daily basis. Perhaps it is time for policymakers to remind themselves that the end goal of education is not to get high scores—even if these scores beat those of our fiercest political rivals—but to help students fit into society and succeed both personally and professionally.


References

Afflerbach, Peter. (2004) "National Reading Conference policy brief: High stakes testing and reading assessment."

Caldwell, J.S. (2008). Reading assessment: A primer for teachers and coaches (2nd ed.). NY: The Guilford Press.

Invernizzi, M. A., Landrum, T. J., Howell, J. L., & Warley, H. P. (2005). Toward the peaceful coexistence of test developers, policymakers, and teachers in an era of accountability. The Reading Teacher, 58(7), 610-618.

Miami-Dade County Public Schools (2012). Technical assistance for identification, placement, and scheduling of students in grades 5-12 in reading classes. Miami, FL: Curriculum and Instruction.

Miami-Dade County Public Schools  (2015). IPEGS procedural handbook. Miami, FL: Office of Professional Development and Evaluation.

National Assessment of Educational Progress. (2015). Fast facts. Retrieved from http://nces.ed.gov/fastfacts/display.asp?id=38

National Research Council of the National Academies. (2011). Incentives and test-based accountability in education. National Academies Press. Washington, DC: Elliott, S. W., & Hout, M. (Eds.).

Price, J., & Koretz, D.M. (2013). Building assessment literacy. In K. Boudett, E. City, & R. Murnane (Eds.), Data wise: A step-by step guide to using assessment results to improve teaching and learning (pp. 35-61). Cambridge, MA: Harvard Educational Press.

Programme for International Student Assessment (PISA). (2013). What students know and can do: Student performance in mathematics, reading, and science. Retrieved from: http://www.oecd.org/pisa/keyfindings/PISA-2012-results-snapshot-Volume-I-ENG.pdf


No comments:

Post a Comment