Evaluating teachers

Experts worldwide seek standards to test teachers fairly

Rebecca Gibson
15 November 2015

7 min read

Determining whether a teacher is effective is difficult, but essential. Leaders in the profession are exploring how a multitude of evaluation methods can be combined to measure individual teacher performance accurately – and fairly.

Laura Nurminen, labour market advisor at the Trade Union of Education in Finland (OAJ), cites an analogy when asked how teachers should be evaluated: “If a man has been involved in a serious accident and a surgeon manages to save his life, but is forced to amputate his legs, did the surgeon succeed or fail?”

Nurminen then compares her example to the world of education: “Similarly, is a teacher ‘good’ or ‘bad’ if they manage to teach a pupil with learning disabilities to read, write and comprehend a short text, if this student still can’t spell correctly at the end of the year?”

It’s a difficult conundrum faced by many governments, educational associations and schools, and finding an answer is critical. In May 2015, a report from the Organisation for Economic Co-operation and Development (OECD) titled “Universal Basic Skills – What Countries Stand to Gain,” found that more than 66% of students in nine of the 76 countries studied, including a significant proportion of those in some of the world’s richest OECD countries, leave school without necessary basic skills. The findings have prompted many countries to re-examine how they assess their teachers, especially because the report also revealed a strong correlation between the quality of a country’s education system and an increase in its gross domestic product.

“Science, Technology, Engineering and Math education (STEM) is closely linked with US economic prosperity in the modern global economy, and strong STEM skills are a central element of a well-rounded education and essential to effective citizenship,” said James Brown, executive director of the STEM Education Coalition. “Numerous studies have validated that nothing is more important to a child’s educational success than a good teacher. It’s incredibly important that the US make robust and sustained investments in preparing and retaining new teachers that are skilled in STEM pedagogical content knowledge. This will also help to excite students about pursuing STEM careers.”




From student test scores to classroom observations, peer reviews and student surveys, many countries studied by the OECD have set up structured systems to evaluate teacher performance. Yet most agree that accurately, fairly and reliably measuring a teacher’s impact on student learning remains a challenge. “Essentially, teachers should be considered effective if their students progress and deemed ineffective if pupils show no improvement,” said Andreas Schleicher, director for the Directorate of Education and Skills and special advisor on Education Policy to the OECD’s Secretary-General. “However, as there is no set definition of ‘effective teaching,’ it’s difficult to quantify it accurately. Do we equate quality with experience? Do we praise a teacher who has raised student scores, rather than those who can engage students through critical thinking and discussion? Or are teachers only effective if they can do all of the above and more?”

John Hattie, director of the Melbourne Education Research Institute (MERI) at Australia’s Melbourne Graduate School of Education, believes that only two questions should matter when trying to evaluate teacher performance: what evidence can teachers provide to demonstrate their overall impact on students, and what actions has the teacher taken as a result?

“Every student should achieve a year’s growth for a year’s input, but what constitutes a year’s growth will differ across schools,” Hattie said. “So teacher effectiveness should always be judged in relation to the school’s expectations.”


Some countries opt to evaluate teachers against standardized criteria defined by external authorities. In 2015, for example, Mexico’s President Enrique Peña Nieto implemented a contentious mandatory standardized skills test when hiring, evaluating and promoting teachers.

Standardized test scores, also called value-added measurements, have been the norm in the United States since the No Child Left Behind (NCLB) Act was introduced in 2001. According the OECD, more than 90% of US teachers are assessed this way, and the Obama administration’s “Race to the Top” initiative awards additional federal funding to state and local schools that use student test scores as part of their teacher evaluation programs.

The premise of such programs is simple: The more effective the teacher, the higher their students’ standardized test scores. How advocates and opponents of the practice respond to it, however, is anything but simple.

“While no one doubts that teaching is complex and success in the classroom can, and should, be measured by multiple indicators, it’s clear that effective teachers improve student achievement,” said Kate Walsh, president of the National Council on Teacher Quality in the US, where the student test-scores issue has been especially contentious. “Any meaningful and objective understanding of ‘effective’ teaching must be rooted in results for children, and focusing on student growth in teacher evaluations reflects a teacher’s primary responsibility: to improve student academic success. Consequently, student growth and/or value-added data should be the most critical part of a performance measure.”

James Liebman, Simon H. Rifkind Professor at Columbia Law School, and director at Columbia Center for Public Research and Leadership, agrees. He argues that value-added analysis of student test scores helps to provide an “apples-to-apples” comparison of teachers and schools – provided differences in student populations are taken into account.

“Students want to know how much they have learned,” Liebman said. “And now that the ‘Smarter Balanced’ and ‘Partnership for Assessment of Readiness for College and Careers’ assessments in the US have been improved, test scores are an appropriate, if incomplete, measure of that critical learning outcome.”


Evaluating teachers based primarily on student test scores has generated intense backlash, however, with critics claiming that standardized tests don’t accurately reflect the complexity of the teaching and learning process and that relying on them is unfair to teachers.

Peter Z. Schochet and Hanley S. Chiang’s “Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains” (2010), for example, reported a 35% statistical error rate when using one year of test data to measure a teacher’s average performance and a 25% error rate when using three years of data. Meanwhile, Thomas J. Kane and Douglas O. Staiger’s “Volatility in School Test Scores: Implications for Test-Based Accountability Systems” (2002) indicated that 50%-80% of any improvement or decline in a student’s score can be attributed to one-time factors; for example, a dog barking in the parking lot during the test.

Some US teachers’ unions, including the Tennessee Education Association and the Houston Federation of Teachers in Texas, have filed federal lawsuits to contest these test-based measures, claiming that, in many cases, teachers of non-state-tested subjects have been unfairly penalized if their pupils scored poorly.

“Standardized tests should only be used to provide educators, parents and schools with the information they need to help students progress, not to sanction individual teachers,” said Mary Cathryn Ricker, executive vice president of the American Federation of Teachers, AFL-CIO (AFT). “NCLB started the test-and-punish policies that caused teachers to focus on preparing students for high-stakes tests rather than providing deep instruction. However, this fixation on testing hasn’t improved the quality of teaching or students’ overall learning. We need to end the misuse and over-use of testing to provide children with the high-quality education they need to succeed.”


According to Schleicher, 65% of teachers across the OECD consider student test scores an important part of feedback on their performance. However, many of the countries with top-ranked educational systems prefer evaluations that analyze how closely a teacher aligns with learning objectives and with their individual roles within their institutions.

This year, for example, Ghana began piloting a new Pre-tertiary Professional Teacher Development and Management policy to assess and reward teachers on the basis of their commitment to, and rate of, professional development. Principals at Ark Globe Academy in London introduced weekly football-style coaching sessions with individual teachers. In Finland, which is widely acknowledged to have one of the world’s best education systems, teachers set professional development goals and are recommended to attend annual performance appraisals with their principals.

“Finland’s teachers must hold a master’s degree, so they are considered to be pedagogical experts and are entrusted with professional autonomy,” OAJ’s Nurminen explained. “Just as students learn better without the pressure of standardized assessments, teachers with pedagogical freedom are wholeheartedly committed to understanding how to truly improve their methods rather than learning how to pass yearly evaluations.”


Most researchers and educators agree that, whatever the primary purpose and method of their evaluations, teachers can only become more effective if they are well trained and able to control their professional development.

The OECD’s report “TALIS 2013 Results: An International Perspective on Teaching and Learning,” indicated that this approach has worked well in numerous countries. For example, 80% of teachers in Japan noted ‘moderate to large’ growth in their teaching competencies after acting on feedback from formal appraisals. Similarly, in Singapore, which vied with Shanghai to top the Programme for International Student Assessment (PISA) rankings in 2012, 99% of all new teachers join formal induction programs, 40% have a mentor and more than 80% of principals mandate that teachers take responsibility for improving their own and their students’ learning.

“Evaluation models that have been built in direct collaboration with teachers and unions are the most powerful because they are focused on helping new, struggling and good teachers to improve and identifying those who continue to fail despite receiving full support,” AFT’s Ricker said. “Teachers often receive feedback from an external administrator months after their lessons. But, like our students, we’re more likely to improve if we can self-evaluate or if advice is given in real time by peers who have knowledge of our students.”


Many argue that until there is a universal definition of effective teaching, developing valid instruments that can accurately measure teacher performance will remain almost impossible.

“While we can use current evaluation systems to measure factors such as a teacher’s pedagogical content knowledge, there are myriad unquantifiable aspects that potentially contribute to a teacher’s impact on students’ learning,” said Stuart Kime, a director of evidencebased.education, a UK-based education consultancy. “For example, there is moderate evidence suggesting that classroom management is a contributory factor to learning, but measuring this reliably is difficult – even with the best available measurement systems – making assertions about its contribution to learning even more challenging.”

Kime is working with researchers from England’s Durham University, as well as Oslo (Norway), Rutgers and Harvard universities, US educational testing and assessment organization ETS, and the German Institute for International Educational Research to explore new ways of evaluating teaching quality reliably and validly. “Several systems have been tested to explore how providing teachers with diagnostic insights into their practice, and peer-coached consultations between trusted colleagues, can help them identify areas for improvement,” he said. “If we provide teachers with comprehensive, triangulated information from multiple sources and trust them to act as professionals, a sustainable, iterative process of reflective practice will develop.”

Like many education experts, OECD’s Schleicher believes that, in future, the most accurate evaluation systems will incorporate different methods and empower teachers to take an active role in their professional development.

“Ultimately, if we want to attract and retain the top teachers, we need to ensure that they all receive high-quality education and training, ongoing mentoring and career development opportunities,” Schleicher said. “We also need to empower them to teach with a reasonable degree of professional autonomy in a supportive and collaborative culture. Teacher evaluations are not a magic tool, but if they are carried out in the right way they can certainly make a crucial difference to the quality of teaching and student success.”

Related resources