Thursday, September 24, 2015

Informal Assessments: An Authentic and Informative Perspective of Student Work

My last blog (see “Are High Stakes Assessments Testing Students or Our Patience” below) explored the limitations of formal assessments. In particular it questioned the practice of attaching such high stakes to tests that may not even accurately characterize students’ abilities—especially when instructional time and content are negatively impacted by these assessments.

This week, my attention shifts to informal assessments. Unlike their formal counterparts, informal assessments are typically selected/created and administered by teachers themselves in an authentic classroom setting. Thus, they have a lot of potential to capture a student’s true abilities. Consider, for example, the difference between a high-stakes test like the FSA and observing a student systematically as that student works. There is really no comparison. In the case of observation, the teacher pays careful attention to what a student is doing as the student performs an actual classroom task. Far from simply knowing whether a student is getting a test question right or wrong (as is the case for most formal assessments), in observation, a teacher gains insight into what is helping a child succeed or what factors may be causing the child to get stuck. For instance, in an example provided by Fisher and Frey (2010), the teacher observed that a child was able to instantly start working on a writing prompt four out of five times. However, that same child was only able to finish one of the five prompts. Because of this careful observation, the teacher then decided to work individually with the child while that child was completing another prompt and discovered that the child was not planning the writing at all—which prevented the child from finishing the task (p. 100). A failing score on a state writing exam would never reveal such important information, nor would it suggest how to remedy the problem in the way that this observation did.

Other types of informal observation seem similarly fruitful. Consider, for example, the idea of creating a checklist of skills or strategies that a child needs to master in a particular year. As the child masters the skill, a teacher could check off that skill. The teacher could also note when or if a child was struggling with a particular skill. This is invaluable information that could help guide a teacher’s instruction. The information could also be shared with the child and the child’s parents so that they were aware of the student’s progress.

Or consider the idea of having a child use self-assessment. In self-assessment, a student would have to understand what was being asked of him or her in order to self-assess in a meaningful way. However, shouldn’t children always have a clear understanding of what is expected of them? It is just that for self-assessment, there is no avoiding this understanding. What’s more, self-assessment “can serve as a student motivator” since with self-assessment, control is given to the student him- or herself (Fisher & Frey, 2010, p. 101). Fisher and Frey explain, “Students who are taught to plan and monitor their work habits begin to take ownership of their work” (p. 102).

Rubrics, the fourth type of informal assessment outlined by Fisher and Frey, can be used with self-assessment. Even when they are not, rubrics “provide students with criteria for each level of achievement and can be used to determine needed instruction as well as mastered content” (p. 101).

Portfolio assessment is yet another way to informally assess students. To create portfolios, either the student, or the teacher and student working together, select representative samples of student work on which the child then reflects. The fact that portfolios include several pieces means that portfolios give a much richer picture of what a student can actually achieve. Moreover, piecing together a portfolio can help the student understand the deeper themes linking individual lessons and tasks. With portfolio, students are also able to see the connection between their effort and the grades they receive—which can be a powerful motivator. Alvermann (2011) writes, “When they [i.e., students] are asked to select, polish, arrange, and analyze their own work, they have a chance to see that learning is not haphazard or incidental to any efforts of their own. They also have more direct input into what ultimately goes on their report cards” (p. 157).

Caldwell (2008) writes, “It would be difficult to imagine a reading classroom where teachers or coaches did not listen to their students read and did not evaluate their comprehension” (p. 49). Administering an Informal Reading Inventory (IRI) requires a teacher to do just that. In the process, the teacher acquires a wealth of information about a student’s reading abilities. After all, so much can be learned by sitting one-on-one with a student and experiencing a text with the child as that child reads. No formal assessment could possibly be sensitive enough to capture every miscue a child makes and to note patterns or judge fluency. Although undoubtedly the most formal of informal assessments, an IRI can provide teachers with a student’s independent, instructional, and frustration levels of reading. And if a teacher is not interested in systematically determining these levels, an IRI process can still be used to determine whether or not a specific passage or text is appropriate for a child. In this case, a teacher would be able to determine if classroom texts are so challenging that they are preventing the child from succeeding in that class. This is especially relevant in a content area, such as science or history, where a student’s reading level is usually not considered (but should be).

Thus far, the picture I have given of informal assessment should assure you that this type of assessment is far superior to a short, high-stakes, end-of-year exam precisely because it reflects what a child does each and every day. Moreover, these assessments can help to shape what kind of instruction happens next. Black and William (1998) write, “Assessment becomes formative assessment when the evidence is actually used to adapt the teaching to meet student needs” (p. 140). That is, rather than just entering the results of an assessment onto a report card or into a calculation of whether or not the student should advance to the next grade, the results of formative assessment are used by the teacher to decide what kind of instruction should be given to help the student. Any of the informal assessments described earlier could and should be used as formative assessments. The fact that these informal assessments are so authentic, so aligned to what is actually happening in the classroom, means that the flow between assessment and instruction should be natural.

Part of the power of formative assessment should lie in how a teacher responds to any assessment—even more traditional assessments such as essays or quizzes. Black and William (1998) note, “Marking is usually conscientious but often fails to offer guidance on how work can be improved” (p. 141). But this need not be the case. They write that “feedback has been shown to improve learning when it gives each pupil specific guidance on strengths and weaknesses, preferably without any overall marks” (p. 144). While avoiding grades may not be possible, Torgesen and Miller write, “Communication to the student based on formative assessments needs to energize and empower improved performance and not be discouraging” (p. 37). Such guidance is essential if students are to actually make progress. It can also prevent low-achieving students from feeling powerless to change their classroom fate. This is extremely important since “pupils who come to see themselves as unable to learn usually cease to take school seriously. Many become disruptive; others resort to truancy” (Black & William, 1998, p. 141). The sad truth of the matter is that “students have “become accustomed to receiving classroom teaching as an arbitrary sequence of exercises with no overarching rationale” (Black & William, 1998, p. 143). But using informal assessments as formative assessments can reverse this negative trend. They can give students—even struggling students—a sense of control over their learning. Ideally, students would feel empowered to continually learn and progress towards their goals as a result of the feedback formative assessments provide to them.



References
Alvermann, D.E., Phelps, S.F., & Gillis, V.R. (2011). Content area reading and literacy (6th ed.). Boston: Allyn & Bacon.
Black, P. & Wiliam, D. (1998). Inside the black box: raising standards through classroom assessment. The Phi Delta Kappan, 80(2), 139-144, 146-148 .
Caldwell, J.S. (2008). Reading assessment: A primer for teachers and coaches (2nd ed.). NY: The Guilford Press.
Fisher, D., and Frey, N. (2010). Enhancing rti: How to ensure success with effective classroom instruction & intervention. Alexandria, VA: ASCD

Torgesen, J.K., & Miller, D.H. (2009). Assessments to guide adolescent literacy instruction. Florida Center for Reading Research Center on Instruction.



Thursday, September 10, 2015

Are High Stakes Assessments Testing Students or Our Patience?

No matter how well a high stakes test is designed, it is never going to fully encompass a student’s knowledge of any given subject matter. Rather, test items reflect only a sample of any given domain (Price & Koretz, 2013, p. 36). Of course, test-makers strive to construct test items to match what students should be learning. However, this is extremely challenging—especially when assessing fluid and cognitively complex skills such as reading comprehension. To even attempt to test such as comprehension, test-makers may resort to a watered-down multiple-choice format or take out more cognitively complex items. In their 2011 report, the National Reading Council disclosed, “It is generally more difficult to design items at higher levels of cognitive complexity and to have such items survive pilot testing” (p. 3-2). Regardless of the test items that are eventually placed on a test, it is naïve to assume that a two to three hour test can accurately gauge what a student has learned over the course of an entire year. It is for this reason that Price and Koretz (2013) warn, “Significant decisions about a student should not be made on the basis of a single score” (p. 42). And yet they are. The Miami-Dade County Public Schools (MDCPS) places students into intervention classes based on their performance on high stakes tests (MDCPS, 2012, p. 2). For example, if a student’s reading score is below the “proficient” level, he or she will be enrolled in an intensive reading class. What this means is that this student must forgo an elective such as art or music to receive an extra period of instruction in reading. The rationale is that this extra course will help struggling students get back on grade level. I will let my readers decide how effective it is to put all struggling readers together in one class that only they—the struggling readers—have to take while their more successful colleagues get to enjoy physical education or art. Perhaps such classes can be successful. However, intervention classes for the 2015-2016 are suspect for an additional reason. Since the scores from last year’s high stakes test (the FSA) were not available, MDCPS students in grades 6-8 have been placed into intervention classes based on test scores from the 2013-2014 school year. That means that current eighth graders are now suffering through an extra reading class because of how they performed on a test they took in sixth grade. Students are punished on the basis of one test that they took years ago despite experts’ warnings that single test scores should not be trusted in isolation.

Students are not the only ones to be punished according to this new system. Teachers are also feeling the pain of a system that makes decision based on single test scores. According to the evaluation system used by MDCPS, 35% of a teacher’s final evaluation for the year must be based on high stakes testing data (MDCPS, 2015, p. 40). However, test data may not be an accurate reflection of a teacher’s worth. As any teacher knows, some groups of students are significantly stronger than other groups. And some years, students are significantly weaker. Though teachers endeavor to address the needs of all students, it is unrealistic to assume that a teacher can take a class full of below grade-level students and, by the end of the year, miraculously transform them—all of them—into performing at grade level. However, the way the system currently works, a teacher is essentially punished for having a lower-performing group of students. Price and Koretz write, “The cohort of students in any one year is often very different from those in previous years, and these differences in student cohorts cause scores to fluctuate substantially from one year to the next, even if the effectiveness of the school remains unchanged” (pp. 42-43). Price and Kortez emphasize, “This inconsistency tends to be particularly large when the performance of classrooms or small schools is described” (p. 42). This situation is exacerbated when the tests being used do not look at progress within performance categories. In explaining standards-referenced tests, Price and Koretz note that “a student who improved from near the bottom of the ‘needs improvement’ range—a very large difference in many cases—would show no improvement” while “a student who progressed a very small amount but crossed a performance standard would be shown as having improved” (pp. 46-37). That doesn’t make much sense. And yet, performance assessed in precisely this way can impact a teacher’s evaluation, which can, in turn, affect a teacher’s future job prospects and even salary. These are monumental consequence for results that are statistically suspect.

Because so much is at stake for both students and teachers when it comes to high stakes tests, teachers inevitably respond the way any rational being placed in the same situation would: they research the test and change their instructional practices so that their students will get the highest test scores they possibly can. This is called “gaming the system” and is one major reason researchers say that test data should be interpreted with caution. When teachers teach to the test in this way, they are not actually expanding students’ knowledge or skill in the subject area. Rather, they are merely helping students do better on the particular test in question. The truth of this statement can be seen in the fact that “typically gains on high-stakes tests have been three to five times as large as gains on other tests with low (or lower) stakes. In numerous cases, large gains on high-stakes tests have been accompanied by no gains whatever on lower-stakes tests” (Price & Koretz, 2013 p. 61). A case in point can be seen by exploring data from the National Assessment of Educational Progress (NAEP). Although this data is collected every two years across the United States, the results do not directly impact individual students or teachers in the same way that state-mandated high stakes tests do. Hence, the NAEP is a low stakes test. Although the NAEP website reveals that there have been gains in reading since 1971, these gains are quite small: nine-year-olds improved 13 points and thirteen-year-olds improved 8 points. Of note, the reading performance of seventeen-year-olds did not improve at all (NAEP, 2015). It would seem that NCLB has done little to improve student performance. If students were getting left behind before NCLB, they are still getting left behind now. Internationally, the United States ranks 17 among the 34 member nations (PISA, 2013). This is completely average. As a world power, the United States is not content with average; we want to see our students’ scores rise. However, the data reveals that “there has been no significant change in these performances over time” (PISA, 2013). Shanghai can rest easy knowing that it is in no real danger of being overtaken by the US anytime soon. Again, NCLB seems to have failed in helping spur any real change or improvement.

Perhaps no one should really be surprised since “there is no research that links increased testing with increased reading achievement” (Afflerbach, 2004, p. 6). Let me repeat that astounding sentence written as part of the 2004 National Reading Council Policy Brief: “There is no research that links increased testing with increased reading achievement.” And yet testing continues unabated. More disturbingly, time spent testing may mean less time spent actually engaging in activities that would benefit students.

Part of the problem is that high stakes test scores are difficult for teachers to interpret and use when crafting instruction. They may also seem far removed from the classroom. As such, “Rather than give up what they consider to be valid, instructionally useful assessment practices such as running records of students’ oral reading, most teachers have continued with their own procedures while “adding on” what is externally imposed. The resulting redundancy is staggering” (Invernizzi, Landrum, Howell, & Warley, 2005, p. 617). Even if teachers understood the results of tests perfectly, there is always a disturbing delay between the point in time when a student takes a test and the point in time in which test scores are reported back. Test scores from the FSA Reading test taken last spring are still forthcoming.

One idea put forth by Caldwell (2008) is that classroom tests—which can be created by teachers themselves—take the place of high stakes tests. However, Caldwell is not proposing that NCLB and its high stakes tests merely disappear. Rather, to fill the void and offer a similar level of accountability, classroom assessments would have to be constructed with validity and reliability in mind. That is, they would have to be conscientiously constructed to test what is actually taught in class, what teachers really want students to learn. Doing so could result in a much richer picture of student performance, as individual teachers could use assessment formats well beyond the multiple-choice or short answer usually seen on a high stakes test. Teachers might use such rich sources of information as, for example, portfolios in combination with projects, essays, speeches, and more traditional quizzes or tests. However, although there is a great potential in using classroom assessments for everyone’s benefit, these assessments would have to be created and used by all teachers so that performance across teachers and grades could be determined. If everyone designed their own test, then no one could ever really compare one student’s score to another. After all, one teacher’s tests may be significantly more difficult from that of the teacher down the hall. Moreover, classroom assessments should not be clouded by such factors as participation or effort. But currently, many if not all teachers factor behavior and effort into a student’s final grade. If actual achievement is truly what we want to track, then for better or for worse, participation and effort are beside the point.

Assessments designed by teachers themselves might prove to be part of the solution. However, it is important to keep in mind the overarching goals of education and ponder the extent to which a test—any test—can assess students. The NCR Report reminds us that “less tangible characteristics—such as curiosity, persistence, collaboration, or socialization—are not tested. Nor are subsequent achievements, such as success in work, civic, or personal life, which are examples of the long-term outcomes that education aims to improve” (National Research Council of the National Academies, 2011, p. 3-1). However, what more could we ask of the educational system than that it help our students meet these “intangible goals”? If the goal of education is to prepare students to be successful and productive citizens, high stakes tests may have failed. Perhaps it is time to intervene and begin a new era of classroom-based assessment that is designed and approved by those who work in classrooms on a daily basis. Perhaps it is time for policymakers to remind themselves that the end goal of education is not to get high scores—even if these scores beat those of our fiercest political rivals—but to help students fit into society and succeed both personally and professionally.


References

Afflerbach, Peter. (2004) "National Reading Conference policy brief: High stakes testing and reading assessment."

Caldwell, J.S. (2008). Reading assessment: A primer for teachers and coaches (2nd ed.). NY: The Guilford Press.

Invernizzi, M. A., Landrum, T. J., Howell, J. L., & Warley, H. P. (2005). Toward the peaceful coexistence of test developers, policymakers, and teachers in an era of accountability. The Reading Teacher, 58(7), 610-618.

Miami-Dade County Public Schools (2012). Technical assistance for identification, placement, and scheduling of students in grades 5-12 in reading classes. Miami, FL: Curriculum and Instruction.

Miami-Dade County Public Schools  (2015). IPEGS procedural handbook. Miami, FL: Office of Professional Development and Evaluation.

National Assessment of Educational Progress. (2015). Fast facts. Retrieved from http://nces.ed.gov/fastfacts/display.asp?id=38

National Research Council of the National Academies. (2011). Incentives and test-based accountability in education. National Academies Press. Washington, DC: Elliott, S. W., & Hout, M. (Eds.).

Price, J., & Koretz, D.M. (2013). Building assessment literacy. In K. Boudett, E. City, & R. Murnane (Eds.), Data wise: A step-by step guide to using assessment results to improve teaching and learning (pp. 35-61). Cambridge, MA: Harvard Educational Press.

Programme for International Student Assessment (PISA). (2013). What students know and can do: Student performance in mathematics, reading, and science. Retrieved from: http://www.oecd.org/pisa/keyfindings/PISA-2012-results-snapshot-Volume-I-ENG.pdf