When the Measure Becomes the Target

Goodhart’s Law states that when a measure becomes a target, it ceases to be a good measure. The formulation is attributed to the British economist Charles Goodhart, who observed it operating in monetary policy in the 1970s.

The Bank of England adopted money supply as its target indicator for controlling inflation, and the financial sector promptly found ways to influence money supply figures that had no relationship to the underlying economic behaviour the figures were supposed to track.

The measure was accurate as long as nobody was optimising for it. Once it became the target, it became gameable, and once it was gamed, it stopped measuring what it was supposed to measure.

The observation generalises beyond monetary policy with uncomfortable ease. Wherever a complex human condition is reduced to a measurable indicator, and wherever the indicator becomes the criterion by which the system managing the condition is evaluated, the same dynamic tends to emerge. The people operating within the system learn, with or without explicit instruction, that their performance is being assessed against the indicator. They optimise for the indicator. The indicator improves. The underlying condition the indicator was supposed to track may or may not improve with it, and the system has progressively less ability to tell the difference, because the indicator is now what the system is measuring rather than the condition the indicator was supposed to approximate.

Unemployment is the most consistently documented domain in which this dynamic operates, because the indicator is visible and the gap between the indicator and the underlying condition is wide enough to be obvious when examined. The unemployment rate, as typically measured, counts people who are without work and actively seeking it. It does not count people who have stopped seeking work because they have concluded that seeking is futile. It does not count people who are working part-time but seeking full-time employment. It does not count people who are working in casual or precarious arrangements that do not provide the stable livelihood the employment concept implies. It does not count people who are working in roles so misaligned with their capacities that the employment produces neither the economic sufficiency nor the social participation that meaningful work provides.

The indicator, in other words, counts exits from the unemployment category rather than entries into meaningful and stable livelihood. The two are related but not identical, and the relationship between them is the source of most of what goes wrong when the unemployment rate becomes the system’s primary target.

In Australia, the mutual obligation requirements attached to JobSeeker payments—the job search activities, the appointments, the compliance reporting—are designed around the principle that the system’s function is to move people off the payment and into employment. The design produces the expected result: the system is effective at moving people off the payment. It does this by applying compliance pressure, by reducing the payment when compliance requirements are not met, and by making the conditions of the payment sufficiently onerous that accepting any employment becomes preferable to maintaining the conditions. The indicator—the number of people receiving the payment at any given time—improves. The underlying condition—whether the people who have left the payment have entered meaningful and stable livelihood—is a different question that the system’s measurement apparatus is not primarily designed to track.

The employment that people enter to avoid the payment’s compliance requirements is often not the employment the policy’s stated purpose implied. It is casual, part-time, poorly remunerated, insecure, and sometimes short-lived enough that the person returns to the payment within weeks of leaving it, having demonstrated, for the indicator’s purposes, an exit from unemployment that was not in any meaningful sense an entry into livelihood. The indicator recorded a successful outcome. The person’s life did not change in the direction the successful outcome implied.

The United Kingdom’s experience with Employment and Support Allowance and its successors provides a particularly well-documented version of the same dynamic operating in the disability context. The Work Capability Assessment, administered by private contractors and designed to determine whether disability payment recipients were capable of some form of work, was evaluated on the number of assessments completed and the number of people found capable of work and therefore moved off the payment. The contractors were paid per assessment. The system’s internal incentives rewarded throughput and transition off the payment. The accuracy of the assessment—whether the person found capable of work was actually capable of the work they were found capable of—was a different matter, assessable only by subsequent events: the appeals process, which overturned a substantial proportion of the initial decisions, and the health consequences for people whose conditions were judged less severe than they were and who lost the payment and the support it provided.

The assessments improved the indicator. The number of people receiving disability payments declined. The parliamentary inquiries into the consequences documented deaths, deteriorating health conditions, and the psychological effects on people whose genuine incapacity was administratively denied and then reinstated on appeal. The indicator had become the target. The target had been hit. The human condition the indicator was supposed to track had not improved commensurately.

The metric problem extends into healthcare systems with equal consistency. Hospital waiting time targets—the requirement that patients be seen within a defined period—were introduced in the United Kingdom’s National Health Service to address genuinely concerning delays in treatment. The targets were real improvements on the previous absence of accountability. They were also optimised for in ways that diverged from their purpose: patients were kept in ambulances outside emergency departments until the hospital could guarantee they would be seen within the target time from the moment of arrival; appointment letters were sent to incorrect addresses so that failed appointments could be recorded as patient non-attendance rather than system failure; patients were added to and removed from waiting lists in ways that kept the measured waiting time within the target without changing the actual experience of waiting.

The measured waiting times improved. The experience of waiting did not always improve commensurately. The system had been successfully optimised for the indicator, and the optimisation had consumed energy and ingenuity that would otherwise have been available for the underlying purpose.

In China, the introduction of GDP growth targets as the primary criterion for evaluating local government performance produced, over decades, a remarkably consistent pattern of indicator management. Local officials whose career advancement depended on meeting growth targets invested in infrastructure projects, property development, and industrial activity that produced GDP figures without necessarily producing the economic development the figures were supposed to track. The construction of empty apartment complexes, the repetition of infrastructure investment in forms that maximised reported activity, the statistical reporting practices that became the subject of the national government’s periodic efforts at data integrity reform—all of these were the rational responses of people operating within a system that evaluated them on a metric they had the capacity to influence.

India’s experience with food security programmes provides a different variant of the same problem. The public distribution system, designed to provide subsidised grain to people below the poverty line, was evaluated on the volume of grain distributed and the number of beneficiaries on the rolls. The indicator improved consistently over decades. The actual nutrition of the people the system was designed to serve improved more slowly and less consistently, because the grain distribution was subject to leakage—diversion to non-beneficiaries, adulteration, ghost beneficiaries on the rolls—that the volume indicator could not detect and that the system’s design did not make it easy to correct. The measure was the distribution. The target was set by the distribution. The distribution occurred. What the distribution produced in terms of actual nutritional welfare was a question the system’s primary metric could not answer.

The underlying mechanism in each of these cases is the same. The metric is chosen because it is measurable, because it is visible, because it can be reported in a form that demonstrates the system’s activity to the people who are evaluating it. The metric is a proxy for the underlying condition. The proxy is accurate enough, in the early stages, to provide genuine information about whether the underlying condition is improving. As the proxy becomes the target, the people operating within the system find ways to influence the proxy that are independent of the underlying condition. The proxy improves. The relationship between the proxy and the underlying condition weakens. The system loses the ability to distinguish between genuine improvement in the underlying condition and improvement in the proxy alone, because the proxy was the measurement instrument and the measurement instrument has been compromised by its own promotion to target.

What is lost in this process is what Goodhart’s Law was originally about: the signal. The metric was a signal about the underlying condition. When the metric becomes the target, it stops being a signal and becomes a performance. The performance is designed to be detected. The performance is detected. The underlying condition produces no further signal, because the signal has been replaced by the performance of the signal.

The person who has been moved off the unemployment payment and into a casual job that will not last three months has been converted from a problem the system was supposed to address into a successful outcome the system can report. The successful outcome is real within the system’s measurement architecture. The person’s actual condition—the instability, the inadequate income, the failure to reach anything resembling the meaningful livelihood that the employment concept implies—is outside the measurement architecture and therefore not visible as a failure.

The measurement architecture is what the system uses to know whether it is working.

The measurement architecture says it is working.

The person on the three-month casual contract is not in the measurement architecture anymore.

The system has no further information about them.

The metric has been hit.

The target has been achieved.

The condition the metric was supposed to track continues.