Many teachers in the schools we support received Teacher Data Reports last week. This report is a DOE initiative to give teachers and school administrators information about the individual teacher’s impact on student test scores as compared to their peers.
Reactions on the ground.… Ouch!
To put it plainly … many teachers were really upset.
“There are so many things that a teacher does beyond preparing students to pass a math or reading test that can’t be measured…How can this possibly measure a teacher’s value add!? ” A reaction from one of our own professional developers at Teaching Matters.
But I find this report to be pretty complex. So before we reject it, we need to understand what it is and then discuss the merits and the potential improvements.
Teacher Data Report – What does it measure?
In two words. Teacher Contribution.
This is an attempt to measure the teacher’s impact on students test scores. It does this by trying to eliminate the effects on test scores caused by measurable factors that are out of the teacher’s control. For example:
- Student’s prior year reading and math scores, attendance, free lunch status, race, special ed status
- Unusual items like “student is new to school”
- And even something called “classroom effects” like the “percentage of class who attended summer school” or “percentage retained”
The report predicts what test scores a student is likely to receive given all these effects by using ten years worth of historical data. For example, Jenny with this history and within this classroom would be predicted to get the following score _______.
Then, it compares the score Jenny actually received with the predicted score. The difference is called the Teacher Contribution.
Predicted score + Teacher contribution = Actual Test score
The teacher’s contribution is then calculated for all students and compared to other similar teachers (same number of years of experience teaching the same subject). You are provided the following data. Compared to your peers are you in the top 20%, or in the middle at 60 % or in the bottom 30% of teacher contribution.
Why is the DOE doing this?
Research has now proven that teachers have a significant impact on how well students actually do on tests. Ironic, because many teachers feel the test doesn’t capture what students are learning in their classrooms, but actually the tests do capture the fact that teaching and teachers really matter most to student learning and performance on tests. So while teachers may feel the tests don’t adequately measure what you teach, they do capture that the teacher has a big impact on student performance. In fact, the difference between the lowest and highest performing teachers (measured by tests) is almost entire grade level of movement. Students move only half a year versus 1.5 years.
So teacher impact is important, yet there are few easy, reliable measures of teacher quality available.
New York is trying to be a thought leader in developing these metrics.
Couple of Questions …
So if I may, I would like to ask a couple of questions .
To move to teacher value ad assessments shouldn’t we move to testing system that occurs at the end of the cycle with the same teacher? Now, we know many teachers are drilling the first part of the year and “teaching” the second part. What if they could really plan for the test over the course of an entire year … and then be measured on their success.
Is it likely that teachers of students with mostly threes and fours on the test will find it difficult to make that top 20 %. Might these teachers find their “teacher contribution” is not showing up. If children can read adequately, teachers should be spending time more on higher order skills that are not currently measured by the test than worrying about moving high 3s to 4s on a test of reading. (That is my view anyway.)
Will teachers who spend time working to impact those factors “outside of their control” – get penalized by this model? Remember, the goal was to eliminate the impact of factors outside a teacher’s control from the calculation of teacher contribution. Of course higher attendance will increase performance, but it will also raise the bar for what the student needs to achieve if a teacher successfully raises attendence. Just a thought. Maybe that is not significant.
The data also breaks down performance by subgroups. How does this work if a teacher has only a few students in a particular group? Is the data reliable? I am hoping someone more statistically inclined can speak to that and help me better understand the issues here.
A final thought….
No one should look at data without first getting their own assumptions on the table first. If you as a principal find that you were not able to accurately predict your teachers ranking… all you have learned is that you need a lot more information.
Maybe you haven’t had time to visit the classroom or look at the student work being produced in it this year … maybe that is part of the problem.
What you do you think about this initiative? What are the potential benefits and/or pitfalls for our students?

[...] Don’t miss this analysis by Lynette Guastaferro of the new DOE teacher data reports. [...]
Surely it’s imperfect (and some suggestions in the article could help!), but finally a way to start quantifying the impact of teacher A over teacher B. This is needed if we are ever going to eliminate the achievement gap.
Some people are just not cut out for teaching, and having data over the course of years would clearly indicate that problem. Teaching is as demanding as high-paying professions in law and medicine; why shouldn’t it be just as difficult to get recognition?
Sample size remains a big issue here. 30 students really is not that many, for statistical purposes. Even 150 is not a huge number. (For example, think about political polling. How many of them have just 150 respondents?)
Because the samples are so small, the confidence intervals — think of them as being like margins of error — are huge! This is one of the major issues with NCLB, one that the testing=accountability=reform folks don’t like to acknowledge. This is basic statistics, not some obscure and/or new-fangled idea.
Of course, there’s a humanistic idea missing, as well. Namely, individual students are more than just a statistical combination of various factors. There are real world issues in student lives that impact student learning that are not captured in these models. Given a large enough sample, these issues can begin to cancel out, with the larger statistical trends taking over. But we are not talking about big sample sizes, so the human issues are not accounted for.
That’s just the tip of the iceberg. There are plenty of other issues, too many to get into in one place.
The problem with this data, as well as assessment data across the board, is that it assumes that the drive by “test” is a credible means of measuring student achievement. I could, however, see this working when schools adopt a more authentic assessment that measures the growth of the whole child, not just what they can regurgitate on test day once a year. By the way, there was an interesting article in the New Yorker in Dec. by Malcolm Gladwell (see: http://www.newyorker.com/reporting/2008/12/15/081215fa_fact_gladwell of teacher impact ) that compares the scouting process (measurement design included) of potential NFL quarterbacks and their prediction accuracy to the relatively non-existent vetting of new teachers entering the classroom. It is definitely eye opening on many levels and relates directly to the topic of this discussion.
[...] their work. The executive director of Teaching Matters, Lynette Guastaferro, called New York “a thought leader” for creating the reports. Others have been wary, including a teacher who wrote about his experience [...]
Just passing by.Btw, your website have great content!
_________________________________
Making Money $150 An Hour