What to Think about the MET Project Results

Jan 16, 2013

Email

What can you do with $45 million and three years? Well, if you’re the Bill & Melinda Gates Foundation, you can confirm, empirically, what educators have always known implicitly: great teaching matters, it can be measured, and it improves student learning.[1]

That was one of the many released last week in the final report from the (Measures of Effective Teaching). MET has generated buzz in education and popular media alike, so I won’t provide a full synopsis here. For a basic summary, check out the ��or��rundown; for more thoughtful commentary, turn to posts from , , , , and . Instead, I want to call attention to two big takeaways from the MET Project.

What teacher evaluations measure is just as important as how they measure it.

Much has been made of the finding that classroom observations are the worst predictor of student learning, compared to state test scores and student surveys. Some have questioned whether observations are the significant time and personnel costs involved to do them well. Tim Daly of TNTP even that MET shows “the way that most teachers have been evaluated forever is completely unreliable.”

It’s easy to jump to that conclusion: MET used proven, high-quality observation tools, and observers were trained and certified on their knowledge of them. This isn’t the case with many of the classroom observations used across the country. ��Still, observations are a critical component of teacher evaluations, particularly for those in the and in untested subjects. And using observations typically receives greater compared to test scores. Finally, MET’s research found that although classroom observations didn’t improve the predictive power of the evaluation measure, they did improve its reliability – or stability – from year to year.��

Test scores also don’t have the same diagnostic power as classroom observations: as put it, “test scores can reveal when kids are not learning; they can’t reveal��why.” Observations can provide teachers with valuable, timely, and clear feedback on their practice. Given their complexity and the timing of state testing, value-added measures are far less teacher-friendly – not to mention, limited in scope. Surely, great teaching involves than improving student scores on multiple-choice tests in two subjects.

To this end, it’s laudable that MET’s researchers also used higher-order tests (the SAT 9 Open-Ended Reading Assessment and the Balanced Assessment in Mathematics) to measure student learning. In some states, these assessments are more similar to the Common Core assessments they will offer in 2014-15. Presumably, states should want teacher evaluations that not only function well with today’s tests, but also those of the future.

Still, the tests MET used only consider English Language Arts and math skills. If the ultimate goal of evaluations is to measure whether teachers create learning environments where students achieve a broader set of outcomes (say, the knowledge, skills, and attributes it takes to be college- and career-ready), then there is still a long way to go in developing these systems. , many states will be simultaneously implementing new teacher evaluations and the Common Core assessments. But the best evaluation systems today do a far better job identifying teachers that improve student learning via state test scores than teachers that improve college and career readiness. MET’s findings suggest that states should carefully consider whether their evaluation systems are measuring the teacher attributes needed to meet the Common Core’s objectives.

How teacher evaluations are used is just as important as what they measure.

Part of the demand for research like the MET Project comes from the push to use teacher evaluation systems to make human resources decisions. Hiring, retention, placement, compensation, and tenure can all be affected. Some of the push can be attributed directly to the Obama administration: developing and using teacher evaluation systems like the ones in the MET study for HR decisions was a major component of both and the .

But there is still uncertainty surrounding teacher evaluation systems; the MET Project doesn’t provide a definitive roadmap or specific policies for states and districts looking to measure effective teaching. Many of its findings are ambiguous (with the exception that value-added measures must account for students’ prior test scores). The MET report is inconclusive when it comes to:

whether student demographics should be included as a control in value-added models;
precisely how to weight each component within a composite effectiveness measure: value-added data, student-perception surveys, and classroom observations;[2]
whether measures like the Content Knowledge for Teaching (CKT) tests or subject-based classroom observation tools could be useful additions to composite measures of teacher quality; and
who should observe teachers, how long these observations should last, and how many observations should occur each year.[3]

The teacher quality measures MET suggests are “better on virtually every dimension than the measures in use now.” But does that mean similar teacher evaluation systems should be used as the deciding factor for whether a teacher is fired? Or promoted? Or receives a pay increase?

Thorny questions, indeed. Yes, the new measures of effective teaching are promising, compared to most teacher evaluation systems where nearly every teacher rated ‘satisfactory.’ But given MET’s lingering questions and inevitable in these measures of effectiveness, wouldn’t it make more sense to continue developing and refining teacher evaluation systems without rushing to use them for high-stakes decisions? Especially since most schools lack the capacity and resources to implement evaluations of the rigor and quality that the MET study used? States and districts should consider using the results from teacher evaluations in a more diagnostic manner: why not make these measures of effective teaching the first step in the process of providing professional development, determining who receives pay increases or tenure, and making decisions about hiring or firing – rather than the final step?

[1] In full disclosure, the work of ��Ƶ’s Education Policy Program is supported, in part, with funding from the Gates Foundation.

[2] However, the “data suggest that assigning 50 to 33 percent of the weight to state test results maintains considerable predictive power, increases reliability, and potentially avoids the unintended negative consequences from assigning too-heavy weights to a single measure.”

[3]�� MET’s results do show that more lessons and observers increases the reliability of observations, but there are “a range of scenarios for achieving reliable classroom observations.”

Issues

Education & Work

��Ƶ

Education & Work

Democratic Futures

Global Security

Technology & Democracy

Thriving Families

Trending Topics

Real Skills, Real Income: Why Youth Apprenticeship Is Resonating Now

Future-Proofing U.S. Nuclear Policy: Forecasting Outcomes of the Nuclear-Armed Sea-Launched Cruise Missile

Debunking Myths on Student Parent Data Collection

The App Store Accountability Act Poses Serious Concerns for Privacy, Security, and Free Expression

Redrawing School Boundaries for Fairer Funding

Reframing Fusion Voting as a Practical, Powerful Reform Strategy

Harnessing Terrorism Data to Reshape U.S. National Security Policy

Establishing a National Housing Loss Rate

��Ƶ Fellows

No Place to Land: Housing Insecurity Among Caregiving College Students

You’ve Changed

From Life Itself

Cultivating Connections: Why Relationships Matter for Youth Apprenticeship