Whose Science Is Economics? On measurement, benchmarks, and the politics of ‘universal' models
Economics is persuasive when it prices things, but it becomes harder to read when it starts ranking societies.
Across different topics, different datasets, and different moral vocabularies, a familiar ordering often reappears. Nordic countries near the top, the North Atlantic as the implied baseline, and much of the Global South as the lagging case. It is not that this pattern must be false. It is that its stability is itself a data point. When many “independent” measures produce roughly the same hierarchy, the obvious question is not only what the world looks like, but what the measurement system is designed to see.
A skeptic might respond: these are correlations, not moral judgments. Fair. My point is narrower. Cross-country economics frequently turns contested social concepts into portable proxies, and those proxies can quietly reward one institutional style more than others. In that situation, the numbers can be internally consistent while the interpretation drifts. The label remains, the object changes.
“Trust” is a useful example because the measurement pipeline is easy to describe. Much comparative work relies on a standard item, often phrased like this: “Generally speaking, can most people be trusted, or do you need to be very careful?” Responses get aggregated by country and then linked to outcomes like growth, investment, institutional performance, or transaction costs. The conclusion often reads as if “trust” were a universal substance that some societies have more of.
The trouble begins before the regression. “Trust” is not a single, stable construct across languages and moral systems. In some contexts it is closer to amanah, a form of honor, obligation, and reputational stake. It is something that binds and can disgrace. In other contexts it is closer to procedural confidence, the expectation that rules hold and cheating is punished. A translated sentence does not automatically unify these into one comparable object. It produces a number that may summarize answers, while masking differences in what people thought they were answering.
Even within the same survey tradition, the trust item is not as clean as its popularity suggests. Some research finds that such questions can track perceived trustworthiness more than actual trusting behavior. That already shifts what is being measured: from willingness to risk cooperation to a general stance about how people are. It is not useless, but it is not the same thing as “trust” in the everyday sense that drives cooperation in specific settings.
The “cost of trust” framing adds another layer of slippage. Once the discussion moves from attitudes to costs, trust often becomes a proxy for the smoothness of impersonal exchange: contract enforcement, bureaucratic predictability, low-friction formal transactions. Those are meaningful variables. They are also closer to enforcement capacity and institutional reliability than to interpersonal trust. If the proxy is built around formal systems, countries built around stable formal systems will score well. Societies where coordination relies more on reputation, kin networks, informal obligation, and community enforcement will score poorly, even when those mechanisms are effective within their domain. The result is not necessarily wrong. It is often narrower than the headline suggests.
A small cultural detail makes this harder to ignore. Many societies celebrated as “high trust” teach suspicion as public prudence: “better safe than sorry,” “Vertrauen ist gut, Kontrolle ist besser,” “les bons comptes font les bons amis.” These sayings do not prove that “high trust” is a lie. They do show that everyday social wisdom in those societies often treats trust as something to verify, audit, and bound. That fits a world of enforceable contracts and institutional safeguards. It does not fit a simple moral story in which “high trust” equals a superior cultural virtue and “low trust” equals a deficit.
Sampling and interpretation then compound the problem. Cross-national surveys do not observe “a country” the way an instrument observes a physical quantity. They observe respondents reached by a method and willing to answer. The reachable and willing are not evenly distributed across social classes, regions, education levels, and relationships to the state. More importantly, even perfect response rates would not solve the construct issue. If respondents in different contexts do not interpret “most people” or “trust” in the same way, the estimate can be statistically precise while conceptually unstable.
At this point, a defender can still say: even if concepts are imperfect, the correlations are real and high-scoring countries do better. I am not claiming that the correlations are spurious in general. I am challenging a specific inference that often rides on them: that the ranking reflects a deep moral quality of populations, rather than a mixture of institutional history, enforcement capacity, and measurement choices. “Difference” becomes “deficiency” too quickly, and the proxy quietly supplies the moral tone.
This is not only a methodological argument. It has downstream effects because these measures travel. They become part of how policy reports, consulting frameworks, and management teaching narrate the world. “Low trust” becomes an explanation for why control is necessary. “High power distance” becomes an explanation for why hierarchy is inevitable. “Weak institutions” becomes a catch-all diagnosis that is easy to cite and hard to operationalize carefully.
Here I want to shift the responsibility away from a simple villain story. It is not necessary to assume racist intent by Western academics. Intellectual centers tend to universalize their categories; they build models in the settings where they have data, funding, and institutional continuity. That is predictable.
The more consequential question is what happens when universities and policy ecosystems in the Global South treat these frameworks as settled truth. Imported measures become default curriculum. Students learn to repeat “low trust,” “weak institutions,” “high power distance” as if these were clinical facts rather than contested constructs. Over time, the categories harden into identity, and they can rationalize unhelpful practices. In management, for example, the same claims can be used to justify coercive supervision on the grounds that “our people need control.” That practice can then reduce real trust and increase evasion, creating the conditions the models describe. A description becomes a program.
If the problem is measurement plus benchmark dependence, the remedy should be concrete and testable, not rhetorical. A practical standard is that claims of universality should survive two checks.
First, a meaning check. Show that the key concept is equivalent across languages and settings, not just translated as words. Treat “trust” as a construct that must be validated locally, and do not rely on one survey item to represent it. Where possible, triangulate self-reports with behavioral measures and context-specific indicators.
Second, a mirror check. Apply the same concept back onto the benchmark societies and include the costs they normalize. If “trust” is tied to smooth exchange, then measure the “trust tax” in rich states: legal overhead, compliance burden, audits, fraud-defense, and litigation. If “corruption” is measured in one region as petty bribery, then measure regulatory capture, offshore opacity, and legalized evasion in another. The goal is not to flip the hierarchy for political satisfaction. It is to prevent a single institutional style from being treated as the silent definition of the human.
None of this requires rejecting economics. It requires treating cross-country economics as an empirical project that must earn its universality, not assume it. When a measure survives translation as a construct and survives being turned back on the benchmark cases, it becomes harder to dismiss and more likely to illuminate rather than grade.
Whose science is economics, then? Too often, it belongs to those who set the definitions and to those who adopt them without auditing what they measure. The first part is predictable. The second part is a choice.
Identity
]