Teachers, “Smart People” and Flawed Statistics: What I want to tell my Dad about PISA Scores and Economic Growth

Sullen Teachers, “Smart People”, and Free Lunch.

“How can so many smart people not understand?” my father said exasperated, in a thinking out loud sort of way, “We are doing everything we can just to help these kids adjust to society and find some direction in life. For many of them, we are the only stability they have. But now the State Officials say we are ineffective because our test scores are declining!”

I was only in middle school at that time, so I did not have an answer for my Dad that day. Instead, we drove home sullenly. Dad was then already a 20-year veteran math teacher at the public high school in our small town (Galt, California). Earlier that afternoon, officials for the State Board of Education had apparently turned the monthly faculty meeting into a “blame the teachers” session. And my Dad was in disbelief that all the recrimination was being chalked up to ‘Teacher (In)Effectiveness’.

As I would later find out, at that time – sometime around 1992 – roughly 22% of the students at the high school were on what is commonly called “Free and Reduced Lunch”, a euphemism for Title I Federal support to the poorest students in the district. Children from poor families (under 130% of the federal poverty level) could come to school and get fed breakfast and/or lunch for free. The plan sought to ease the financial burden on poor households, but also sought to ensure pupils were not so hungry they couldn’t concentrate on learning. A district’s percentage of students on Free and Reduced Lunch was basically a barometer of poverty in the wider community. So my Dad seemed to be on to something: performance was less about effective teachers, more about the wider socio-economic make up of the community.

But still why did so many “smart” people believe it was a matter of teacher quality?

The Truth of Teacher Effectiveness

I attended that same high school, then later followed in my Dad’s footsteps and became a teacher. This led many of my Yale classmates to remark – without any hint of irony - that I had ‘thrown away’ a good education. But soon after entering the classroom myself, I too could not understand why so many “smart” people failed to understand the realities that we teachers face. I couldn’t figure out why auditors, administrators, and academics missed the complexity of educational goals, classroom ecologies, and ambiguity of outcomes. This question was the major reason why I eventually turned to full-time educational research: what did policymakers and researchers think they knew that we in the classroom did not?

As I quickly learned in my graduate school courses, my exasperated Dad had been on the receiving end of the “Teacher Effectiveness” boom, a research trend that peaked in the 1990s. Worried about A Nation At Risk(1984) and now equipped with advances in econometrics and personal computing, many “smart” researchers in the United States argued that the single most critical variable in measurable student outcomes was teachers. Representative here is the work of Eric Hanushek (Stanford University) who wrote a highly influential paper entitled The Economics of Schooling: Production and Efficiency in Public Schools (1986). It argued strongly AGAINST the “underlying assumption that poor districts (in terms of property tax bases) is the same as poor students [in terms of achievement]” (1170). In other words, poor quality was not about the context schools were enveloped in, but instead came down to individual teacher effectiveness in the classroom. This sort of research paved the way for the assertion that effective teachers could make up the difference, a position Hanushek still actively and vocally champions today (see Boosting Teacher Effectiveness, 2014).

Fast forward 30 years. Today we find that – as many readers of this blog will recognize - the assumption of “Teacher Effectiveness” is now taken as scientific truth. Bookshelves sag heavy with works on how teacher quality can translate into higher student achievement outcomes. State Officials debate, elaborate, and endlessly rollout new teacher accountability schemes. All of this work comes out of the ‘truth’ of Teacher Effectiveness research of the 1980s-1990s.

A New Truth in the Making? Test Scores and Economic Growth

And now all the “smart people” seem to have a new passion: focusing schools on the competencies for success in the new global Knowledge Economy. The Organization for Economic Cooperation and Development (OECD) calls this a transition to “21st Century Pedagogy”, but the general idea goes by myriad names. The point is that teachers, curriculum, and schools all need to be completely overhauled given the changing demands of a new global labor market – one purported to be increasingly driven by cognitive skills rather than industrial brawn.

The policy manifestation of the new trend is the OECD’s Programme for International Student Assessment (PISA). PISA purports to test proficiencies that are directly linked to success in the labor market, rather than knowledge covered in the official curriculum of particularly countries. Notably, the World Bank has agreed to the logic and is now promoting PISA for Development worldwide as well. The OECD and World Bank aimed to have all countries signed up to PISA by 2030.

Why do these organizations want to test competencies? The reason is they are convinced that higher scores on PISA-style tests will lead to enormous economic gains: achieving universal skills (PISA Level 1) by 2030 could boost GDP for lower-middle income countries by 1302% and 162% for high-income countries ( Hanushek, 2015). The crucial point here is that the OECD projects that the “stunning economic and social benefits” of raising test scores is automatic.

In geeky researcher language: the “smart people” are convinced that the relationship is causal, meaning it holds everywhere in all places and all times. Under this causal assumption, raising PISA scores becomes the new magic bullet to effective schools and a brighter economic future for all.

I can already hear my Dad scratching his (now fully gray) head, probably saying something like: “Many of my best kids couldn’t even stay in school because they were caught up in drugs and violence in the neighborhoods on the westside of town. How do the smart people not get that?”

So where did all this “smart” certainty come from? Unbelievably, much of it comes from the same researcher driving Teacher Effectiveness in the 1990s we already met above: Eric Hanushek. Eric Hanushek is now a researcher at the Hoover Institute, Stanford University in posh Palo Alto, California. Some readers might recognize his name as he has been a highly vocal critic of teachers unions in America and worldwide (see a recent March 2017 piece in the Wall Street Journal). Interestingly, Palo Alto is located less than 100 miles from my Dad’s low-income high school.

Teaming up with a research in Germany (Ludger Woessmann), Hanushek’s new project analyzed the relationship between previous international test results and current PISA test scores, then connected it all with economic growth worldwide from 1960 to 2000. In doing so, Hanushek and Woessman (H&W) claimed to find a link so strong, it was deemed to be causal (Figure 1a). That is, as much as 57% of a country’s economic growth resulted from pupil’s test scores in math and science. H&W had apparently uncovered the decisive international evidence to link test scores and economic growth. This work forms the backbone of a new ‘scientific truth’ that links test scores and economics worldwide, supporting the rapid expansion of PISA described above.

Flawed Statistics: What I want to tell my Dad

But what if those numbers were actually wrong?

My career switch into academic research had given me the tools to check the Hanushek claims. I hoped I could, at very least, explain to my Dad – in more simple terms - what the “smart” people seemed to see that we could not.

But when I more closely scrutinized the numbers, I was disturbed by what I had found. I utilized the exact same sample of countries, data, and methods utilized by H&W. The only adjustment I made was selecting a more reasonable time period for the economic growth calculation. Once doing this, however, I came up with a rather discomforting result: the relationship becomes so dramatically weak it showed the Hanushek claims were invalid.

The key point is time mismatch. And one need not be an educational researcher or statistics geek to understand the logic. Hanushek compared test scores and economic growth over the same time period (1960-2000), reporting the strong relationship he found (Figure 1a). However, it logically takes at least a few decades for students to occupy a major portion of the workforce. So I compared test scores for one period with economic growth in a subsequent period (1995-2014). Surprisingly, the relationship, which had once looked so strong, now looked suspect (Figure 1b). This finding not only refuted the tight test score-GDP growth link peddled, but also unmasked causality as simply statistical coincidence. And coincidence can certainly not support the notion that schools had to be aggressively reformed to raise test scores.

In plain English: the statistics were flawed. Although somewhat technical, the full paper has been published as A New Global Policy Regime Founded on Invalid Statistics? Hanushek, Woessman, PISA and Economic Growth(Komatsu and Rappleye, 2017). Readers who persist with the technical details will see how the façade of certainty has been constructed.

So what to tell my Dad? He retired in June 2016 as Principal of that same small high school, after spending more than 40 years in public education. Over the final few years before his retirement, we talked several times about “these new international PISA tests,” with my Dad again exasperated: “They say if we don’t raise test scores then we are endangering the future of the entire country.” He continued: “But I still just don’t get it. How can the State Officials expect us to raise test scores when our kids lack stability in the home and the community?”

The year my Dad retired the number of students at the school on Free and Reduced Lunch stood at 68% - an increase of roughly 200% since the early 1990s. Nonetheless, the school was deemed ‘failing’ under the No Child Left Behind legislation for not showing enough “Adequate Yearly Improvement” in standardized test scores. Now I know Dad was on to something.

But at least now I can tell him:

“Dad, the people who come and tell you these things are not so smart after all. They are blind to the realities in classrooms, schools and communities, probably because they have spent their whole lives looking at numbers, data points, and econometric models. But when you really look at the numbers, all of their purported “truth” comes from flawed statistics. Dad, you were right all along.”

Then continue:

“But perhaps these people find it difficult to demonstrate they are smart or keep their jobs if they just confirm what teachers in classrooms already know. So, Dad, I hope you can use my research – which is ultimately based on your insights into education shared with me long ago – as a way to challenge these “smart people” in the future.”

Yet, my Dad is retired now. So he won’t be telling anyone this. Therefore it is on the readers of this piece to do so. If we can do so, we might be able to fend off a replay of the Teacher Effectiveness movement. If we can resist the narrowing of education around flawed statistics and “smart” researchers, we might be able to (re)prioritize the experience of teachers on the ground – those with so many insights to share - those just like my Dad.

This article is part of a series we have been running about the impact of international assessment surveys on teachers, students and systems. Worlds of Education is a space for open and informed debate. Contributions are written by independent commentators and do not represent EI policy.

Ei-iE

Worlds of Education