What randomisation can and cannot do: The 2019 Nobel Prize
The 2019 Nobel Prize in Economic Sciences has been awarded jointly to Abhijit Banerjee, Esther Duflo, and Michael Kremer “because of their experimental method of alleviating global poverty”. This column outlines their effect on development economics research and practical action to lessen poverty. In addition, it considers a number of the critiques of randomised controlled trials as a procedure for development.
Abhijit Banerjee, Esther Duflo, and Michael Kremer have won the 2019 Nobel Prize. Their victory was inevitable, and for an easy reason: a whole branch of economics – development – looks absolutely not the same as what it appeared as if 30 years back.
Development used to be essentially a branch of economic growth. Researchers studied topics just like the productivity of large versus small farms, the type of ‘marketing’ (or the type of markets and how economically connected different regions in a country are) or the need of exports versus industrialisation. Studies were almost wholly observational, deep data collections with throwaway references to old-school growth theory. Policy was largely driven by the subjective impression of donors or programme managers about projects that ‘worked’. To become a bit too honest – it had been a dull field, and therefore a backwater. And worse than dull, it had been a field where scientific progress was seriously lacking.
Development economics transformed
Banerjee (2005) includes a lovely description of the situation when he entered the field of development economics. Plenty of probably guidelines were funded, informed deeply by history, but with hardly any convincing evidence that highly funded projects were achieving their stated aims. Of the World Bank Sourcebook of recommended projects, from scholarships to girls to vouchers for poor children to citizens’ report cards were recommended.
Did these really work? Banerjee quotes a programme providing computer terminals in rural regions of Madhya Pradesh, which explains that because of too little electricity and poor connectivity, “just a few of the kiosks have became commercially viable”. Without irony, “following success of the initiative”, similar programmes will be funded.
Clearly this situation was unsatisfactory. Surely we have to be able to measure the projects we’ve funded already? And better, surely we have to structure those evaluations to see future projects? Banerjee again: “the most readily useful thing a development economist can do in this environment is operate for hard evidence”.
And where do we get hard evidence? If by this we mean internal validity – that’s, if the effect we claim to have observed is actually the effect of a particular policy in a specific setting – applied econometricians of the ‘credibility revolution’ in labour in the 1980s and 1990s provided a remedy. Either benefit from natural variation with useful statistical properties, just like the famed regression discontinuity if not randomise treatment such as a medical study. The theory here’s that the assumptions had a need to interpret a ‘treatment effect’ tend to be less demanding than those had a need to interpret the estimated parameter of an economic model, hence much more likely to be ‘real’. The problem in development is that a lot of of what we value can’t be randomised. How are we, for example, to randomise whether a country adopts import substitution industrialisation or not, or randomise farm size under land reform – and at a scale large enough for statistical inference?
What Banerjee, Duflo and Kremer noticed is that a lot of what development agencies do used has nothing in connection with those large-scale interventions. The day-to-day work of development is making sure teachers arrive to work, vaccines are distributed and adopted by children, corruption will not deter the creation of new businesses, and so forth.
By wearing down the task of development on the macro scale to evaluations of development at micro scale, we are able to at least say something credible in what works in these bite-size pieces. No more if the World Bank Sourcebook provide a set of recommended programmes, predicated on handwaving. Rather, if we are to invest 100 million dollars sending computers to schools in a developing country, we have to at least manage to say “whenever we spent five million on a pilot, we designed the pilot to be able to learn that computers for the reason that particular setting resulted in a 12% reduction in dropout rate, and therefore a 34-62% profits on return according to standard estimates of the hyperlink between human capital and productivity”.
The experimental approach
How exactly to run those experiments? How should we set them up? Who can we reach pay for them? Just how do we cope with ‘piloting bias’, where in fact the initial NGO we pilot with is more capable compared to the government we be prepared to act on evidence learned in the first study? Just how do we cope with spillovers from randomised experiments, econometrically?
Banerjee, Duflo, and Kremer not merely ran a number of the famous early experiments, in addition they established the premier academic institution for running these experiments – J-PAL at MIT – and additional wrote a few of the best-known practical guides to experiments in development (e.g. Duflo et al. 2007). It isn’t a stretch to state that the Nobel was presented with not only for the laureates’ direct work, also for the collective contributions of the field they built.
Nonetheless, most of the experiments written directly by the three winners are actually canonical. Let’s focus on Michael Kremer’s paper on deworming with Ted Miguel (Miguel and Kremer 2004). Everyone agreed that treating kids infected with things such as hookworm has large health advantages for the kids themselves. But since worms are spread by outdoor bathroom use and other poor hygiene practices, one infected kid may also harm nearby kids by spreading the condition.
Kremer and Miguel suspected that one reason school attendance is indeed poor in a few developing countries is due to the disease burden, and therefore that reducing infections among one kid benefits the complete community, and neighbouring ones aswell, by reducing overall infection. By randomising mass school-based deworming, and measuring school attendance both at the focal and at neighbouring schools, they discovered that villages so far as 4km away saw higher school attendance (4km rather 6km in the initial paper because of a correction of one – Clemens and Sandefur 2015 – in the analysis).
Note the nice economics here: a differ from individual to school-based deworming helps identify spillovers across schools, plus some care switches into handling the spatial econometric issue whereby density of nearby schools equals density of nearby population equals differential baseline infection rates at these schools. A supplementary year of school attendance could therefore be ‘bought’ by a donor for $3.50, much cheaper than other interventions such as for example textbook programmes or additional teachers. Organisations like GiveWell (2018) still rate deworming being among the most cost-effective educational interventions on the planet. With regards to short-run impact, surely that is among the single most important bits of applied economics of the 21st century.
The laureates also have used experimental design to discover that some previously respectable programmes aren’t as vital that you development as you may suspect. Banerjee et al. (2015) studied microfinance rollout in Hyderabad, randomising the neighbourhoods that received usage of a significant first-generation microlender. These programmes are usually woman-focused, joint-responsibility, high-interest loans such as the Nobel Peace Prize winning Grameen Bank.
Around 2,800 households over the city were initially surveyed about their family characteristics, lending behaviour, consumption and entrepreneurship; then follow-ups were performed a year following the microfinance rollout; and 3 years later. While ladies in treated areas were 8.8 percentage points much more likely to have a microloan, and existing entrepreneurs do actually increase shelling out for their business, there is absolutely no long-run effect on education, health or the chance women make important family decisions, nor does it make businesses more profitable. That’s, credit constraints, at least in poor neighbourhoods in Hyderabad, usually do not appear the primary barrier to development.
This is not very surprising, since higher-productivity firms in India in the 2000s already had usage of reasonably well-developed credit markets, and surely these firms will be the main driver of national income (follow-up work – Banerjee et al. 2019 – does see some benefits for high talent, inadequate entrepreneurs, however the long-run key result remains).
Let’s realise how wild this paper is: a literal Nobel Peace Prize was awarded for a kind of lending that hadn’t really been rigorously analysed. This type of lending effectively didn’t exist in rich countries at that time they developed, so that it is not a required condition for growth. Yet large numbers of money went right into a somewhat-odd financial structure because donors were nonetheless convinced, based on very flimsy evidence, that microlending was critical.
Critiques of randomised controlled trials
By replacing conjecture with evidence and showing that randomised controlled trials (RCTs) can in fact be run in lots of important development settings, the laureates’ reformation of economic development has been unquestionably positive. Or has it? Before time for the (truly!) strengths of Banerjee, Duflo, and Kremer’s research programme, we should grapple with the critiques of the programme and its own influence. Because though Banerjee, Duflo, and Kremer are unquestionably the leaders of the field of development, and the most influential scholars for young economists employed in that field, the pre-eminence of the RCT method has resulted in some virulent debates within economics.
Donors love RCTs, because they help pick the best projects. Journalists love RCTs, because they are easy to explain (Wired 2013, in an example of this hyperbole: “However in the realm of human behaviour, just as in the realm of medicine, there is absolutely no better way to get insight than to compare the result of an intervention with the result of doing almost nothing. That is: You will need a randomized controlled trial.”) But though RCTs are of help, as we’ve seen, they are by no means a ‘gold standard’ weighed against other styles of understanding economic development. The critiques are three-fold.
First, as the approach to random trials is ideal for impact or programme evaluation, it isn’t great for focusing on how similar however, not exact replications will perform in various settings. That’s, random trials haven’t any specific claim to external validity, and even are worse than other methods upon this count.
Second, development is a lot a lot more than programme evaluation, and the reason why real countries grow rich has essentially nothing in connection with the kinds of policies studied in the papers we discussed above: the ‘economist as plumber’ famously popularised by Duflo (2017), who rigorously diagnoses small problems and proposes solutions, can be an important job, however, not as important as the engineer who invents and installs the plumbing to begin with.
Third, even if we only value internal validity, and only value the inner validity of some effect that may in principle be studied experimentally, the perfect experimental design is normally not an RCT. Why don’t we tackle these issues subsequently.
The external validity problem is often seen to be one linked to scale: well-run partner NGOs are simply better at implementing any given policy than, say, a government, therefore the good thing about scaled-up interventions could be lower than that identified by an experiment.
We call this ‘piloting bias’, nonetheless it is not actually the core problem. The core problem is that the mapping in one environment or onetime to the next depends upon many factors, and by definition the experiment cannot replicate those factors. A labour market intervention in a high-unemployment country cannot inform within an internally valid way in regards to a low-unemployment country, or a country with different outside options for urban labourers, or a country with an alternative solution social back-up or cultural traditions about income sharing within families.
Worse, the mapping from a partial equilibrium to an over-all equilibrium world is not very obvious, and experiments usually do not inform regarding the mapping. Giving cash transfers for some villagers could make them better off, but giving cash transfers to all or any villagers could cause land prices to go up, or cause more rent extraction by corrupt governments, or cause a variety of other changes in relative prices.
You can observe this problem in the scientific summary of the year’s Nobel (Royal Swedish Academy of Sciences 2019). Literally, the introductory justification for RCTs is that, “[t]o give just a couple of examples, theory cannot reveal whether temporarily employing additional contract teachers with a chance of re-employment is a far more cost-effective way to improve the standard of education than reducing class sizes. Neither did it reveal whether microfinance programs effectively boost entrepreneurship among the indegent. Nor does it reveal the extent to which subsidized health-care products will raise poor people’s investment within their own health.”
Theory cannot reveal the answers to these questions, but an internally valid RCT can? Surely the wage of the contract teacher vis-à-vis more regular teachers and therefore smaller class sizes matters? Surely it matters how well trained these contract teachers are? Surely it matters what the incentives for investment in human capital by students in the given location are?
To place this another way: run literally whatever experiment you would like to operate on this question in, say, rural Zambia in grade 4 in 2019. Then predict the cost-benefit ratio of experiencing additional contract teachers versus more regular teachers in Bihar in senior high school in 2039. Who think you will find a link? Actually, let’s become more precise: who think you will find a link between everything you learned in Zambia and exactly what will happen in Bihar which isn’t primarily theoretical?
Having done no RCT, I could let you know that if the contract teachers are much cheaper per unit of human capital, we have to use more of these. I can let you know that if the students speak two different languages, you will find a greater benefit in having a teacher assistant to translate. I could let you know that if the federal government or other principal has the capacity to undo outside incentives with a side contract, hence aren’t focused on the mechanism, dynamic mechanisms won’t perform aswell as you expect. These kind of statements are theoretical: good old-fashioned substitution effects because of relative prices, or a priori production function issues, or basic mechanism design.
Now, the issue of external validity is one which binds on any kind of study. Randomised trials, observational studies, theory and structural models all must cope with the mapping of setting A to setting B. The difference with RCTs is that while randomisation is a solid statistical tool for understanding cure effect in setting A, it does not have any particular advantage in understanding ‘deep parameters’ or mechanisms that map from A to B.
Duhem-Quine effects imply that models with an increase of structure are generally less inclined to be internally valid – if the auxiliary assumptions are terribly misleading, we might have learned hardly any. However, they will be externally valid, because the implicit logic mapping A to B, and the relevant empirical data had a need to make the mapping, has been organized and gathered.
Simply performing many experiments in lots of settings will not solve this issue: how would you understand that the settings you chose have themselves been randomised, or that you will be stratifying on the heterogeneity that counts for external validity? For instance, to answer the industrial organisation question, “Would firms, generally, improve profits by lowering or raising their prices?”, we’d not think it is worthwhile to randomise individual price changes and measure profit the week after! And if our partner firms in the RCT been ones that price on the inelastic area of the demand curve, we certainly wouldn’t normally want to then write a paper suggesting that firms generally will improve profits by raising prices!
Even if external validity isn’t a concern, we might worry about distortions in what questions researchers concentrate on. A few of the important questions in development can’t be answered with RCTs. Everyone employed in development has heard this critique. But wish critique is oft repeated will not mean it really is wrong. As Lant Pritchett argues (Manik 2018), national development is a social process involving markets, institutions, politics and organisations. RCTs have centered on, in his reckoning, “topics that take into account roughly zero of the observed variation in human development outcomes”.
This isn’t to state that RCTs usually do not study useful questions! Improving the function of developing world schools, determining why malaria nets aren’t used, investigating how exactly to reintegrate civil war fighters: they are not minor issues, and it’s really good that folks such as this year’s Nobelists and their followers provide solid evidence on these topics. The question is among balance. Are we, as economists are famously wont to accomplish, simply looking for keys within the spotlight whenever we focus our attention on questions that are amenable to a randomised study? Gets the concentrate on internal validity diverted effort from topics that are a lot more fundamental to the wealth of nations?
But fine. Why don’t we consider our question of interest could be studied in a randomised fashion. And why don’t we assume that we usually do not expect piloting bias or other external validity concerns to be first-order. We still have a concern: even on internal validity, RCTs aren’t perfect. They aren’t a ‘gold standard’, and the econometricians who rebel from this framing have justification to take action.
Two primary issues arise. First, to predict exactly what will happen easily impose an insurance plan, I am concerned that what I’ve learned in earlier this is biased (for instance, the people observed to use schooling subsidies are more diligent than those that would head to school if we made these subsidies universal).
But I am also worried about statistical inference: with small sample sizes, even an unbiased estimate won’t predict perfectly. Banerjee himself, alongside several theorists, has studied the perfect experimental design for a researcher hoping to persuade an audience with diverse priors in what works. When sample size is low, the perfect study is deterministic, not randomised (Banerjee et al. 2017b).
Econometricians like Max Kasy (2016) show that since randomisation always generates less covariate balance than deterministic assignment of treatments, you don’t want to precisely randomise treatment even in a classic RCT setting. Both of these papers do not talk with observational versus randomised versus structural studies, however they nonetheless represent the broader idea: we value expected loss whenever we generalise, which loss depends on more than merely having an unbiased initial study.
To reiterate, randomised trials generally have really small sample sizes weighed against observational studies. When that is coupled with high ‘leverage’ of outlier observations when multiple treatment arms are evaluated, particularly for heterogeneous effects, randomised trials often predict poorly out of sample even though unbiased (see Alwyn Young 2018 upon this point). Observational studies allow larger sample sizes, and therefore often predict better even though they are biased. The theoretical assumptions of a structural model permit parameters to be estimated a lot more tightly, as we use a priori theory to effectively restrict the type of economic effects.
We have so far assumed the randomised trial is unbiased, but that’s often suspect aswell. Even easily randomly assign treatment, I’ve definitely not randomly assigned spillovers in a balanced way, nor have I restricted untreated agents from rebalancing their effort or resources.
A PhD student at the University of Toronto, Carlos Inoue (2019), examined the result of random allocation of a fresh coronary intervention in Brazilian hospitals. Following arrival of the technology, good doctors moved to hospitals with the ‘randomised’ technology. The estimated effect is therefore nothing beats what could have been found had all hospitals adopted the intervention.
This issue could be stated simply: randomising treatment will not used hold all relevant covariates constant, and if your response is merely ‘control for the covariates you worry about’, then we are back again to the old setting of observational studies where we are in need of a priori arguments in what these covariates are if we are to share with you the effects of an insurance plan.
Theory and the worthiness of experiments
The irony is that Banerjee, Duflo, and Kremer tend to be quite careful in how they motivate their use traditional microeconomic theory. They rarely make grandiose claims of external validity when nothing of the type could be shown by their experiment, as Oriana Bandiera (2019) has discussed.
Kremer can be an ace theorist in his own right, Banerjee often depends on complex decision and game theory (Banerjee et al. 2016), particularly in his early work. No one can browse the care with which Duflo handles issues of theory and external validity and think she actually is merely punting (Banerjee and Duflo 2005, Duflo 2006). Almost all of the complaints about their ‘randomista’ followers usually do not fully apply to the task of the laureates themselves.
And none of the critiques above ought to be taken to imply that experiments can’t be incredibly beneficial to development. Indeed, the proof the pudding is in the tasting: a few of the small-scale interventions by Banerjee, Duflo, and Kremer have already been successfully scaled up! (Banerjee et al. 2017a)
To create an analogy with a company, look at a plant manager thinking about improving productivity. She could read books on operations research and make an effort to implement ideas, nonetheless it surely is also beneficial to experiment with experiments within her plant. Perhaps she’ll learn that it is not incentives but instead lack of information this is the biggest reason workers are, say, applying car door hinges incorrectly. She will then redo training, and discover fewer errors in cars produced at the plant over another year. This evidence – not merely the procedure effect, but also the explanation – may then be taken to other plants at the same company.
All totally reasonable. Indeed, would we not think it is insane for a manager to try things out, and make minor changes on the margin, before implementing an enormous change to incentives or training? Not to mention the same goes, or is going, when the World Bank or DFID or USAID spend tonnes of money trying to resolve some development issue.
On that time, what even would a sceptic agree a development experiment can do?
First, it really is generally much better than other methods at identifying internally valid treatment effects, though still at the mercy of the caveats above.
Second, it could fine-tune interventions along margins where theory gives little guidance. For example, do people not take AIDS drugs because they don’t really believe they work, because they don’t really have the funds, or because they would like to continue having sex no one will sleep with them if they’re seen picking right up antiretrovirals?
My colleague Laura Derksen suspected that folks tend to be unaware that antiretrovirals prevent transmission, hence in locations with high rates of HIV, it could be safer to sleep with someone taking antiretrovirals compared to the population most importantly (Derksen and van Oosterhout 2019). She demonstrates informational interventions informing villagers concerning this property of antiretrovirals meaningfully increases take-up of medication. We study from her study that it might be important regarding AIDS prevention to improve this particular group of beliefs. Theory, of course, tells us little about how exactly widespread these incorrect beliefs are, hence about the magnitude of the informational shift on drug take-up.
Third, experiments allow us to review policies that nobody has yet implemented. Ignoring the issue of statistical identification in observational studies, there might be many policies we desire to implement that are wholly different in kind from those observed in days gone by. The negative tax experiments of the 1970s certainly are a classic example (Hausman and Wise 1976).
Experiments give researchers more control. This additional control is of course balanced against the actual fact that people should expect super meaningful interventions to have previously occurred, and we might need to perform experiments at relatively low scale because of cost.
We have to not be too small-minded here. Nowadays there are experimental development papers on topics regarded as beyond your bounds of experiment. Kevin Donovan at Yale has randomised the keeping roads and bridges connecting remote villages to urban centres (Brooks and Donovan 2018). What could possibly be ‘less amenable’ to randomisation that the literal construction of a road and bridge network?
Where do we stand?
It really is unquestionable that the majority of development work used was predicated on the flimsiest of evidence. It really is unquestionable that armies Banerjee, Duflo, and Kremer have sent in to the world via J-PAL and similar institutions have brought a lot more rigour to understanding programme evaluation. Many of these interventions are actually literally improving the lives of thousands of people with clear, well-identified, non-obvious policy. That’s an unbelievable achievement!
And there is something likeable about the desire of the ivory tower to find yourself in the weeds of day-to-day policy. Michael Kremer upon this point: “The present day movement for RCTs in development economics… is approximately innovation, and evaluation. It’s a dynamic procedure for studying a context through painstaking on-the-ground work, checking out different approaches, collecting good data with good causal identification, learning that results usually do not fit pre-conceived theoretical ideas, focusing on a better theoretical knowing that fits the facts on the floor, and developing new ideas and approaches predicated on theory and testing the brand new approaches.” (Evans 2017). No objection here.
That said, we can not ignore there are serious individuals who seriously object to the J-PAL design of development. Angus Deaton, who won the Nobel Prize only four years back, writes the next (Bryan 2015), consistent with our discussion above: “Randomized controlled trials cannot automatically trump other evidence, they don’t occupy any special place in a few hierarchy of evidence, nor does it seem sensible to make reference to them as ‘hard’ while other methods are ‘soft’… [T]he analysis of projects has to be refocused towards the investigation of potentially generalizable mechanisms that explain why and in what contexts projects should be expected to work.”
Lant Pritchett (2014) argues that despite success persuading donors and policymakers, the data that RCTs result in better policies at the governmental level, and therefore better outcomes for folks, is definately not the case. The barrier to the adoption of better policy is bad incentives, not really a lack of knowledge on what given policies will perform (Gueron and Rolston 2013). These critiques are very valid, and the randomisation movement in development often way overstates what they have, and may have in principle, learned.
But let’s supply the last word to Chris Blattman (2014) on the sceptic’s case for randomised trials in development: “if just a little populist evangelism are certain to get more evidence-based thinking on earth, and tip us marginally further from Great Leaps Forward, I’ve one thing to state: Hallelujah.” Indeed. Nobody, randomista or not, longs to return to your day of unjustified advice on development, particularly ‘Great Leap Forward’ type programmes without the real theoretical or empirical backing!
A few remaining bagatelles
1) It really is surprising how early this award was presented with. Though incredibly influential, the initial published papers by the laureates mentioned in the Nobel scientific summary are from 2003 and 2004 (Miguel-Kremer on deworming, Duflo-Saez on retirement plans, Chattopadhyay and Duflo on female policymakers in India, Banerjee and Duflo on health in Rajathstan). This seems shockingly recent for a Nobel – any kind of other Nobel winners in economics who won entirely for work published so near to the prize announcement?
2) In neuro-scientific innovation, Kremer is most well-known for his paper on patent buyouts (Kremer 1998). Just how do both of us incentivise new drug production but also get these drugs sold at marginal cost once invented? We think the drug-makers have better understanding of how exactly to produce and test a fresh drug than some bureaucrat, so we can not finance drugs directly. If we provide a patent, then high-value drugs return more to the inventor, but at massive deadweight loss. What you want to do is offer inventors some large fraction of the social go back to their invention ex-post, in trade to make production perfectly competitive. Kremer proposes patent auctions where in fact the government pays a multiple of the winning bid with some probability, giving the drug to the general public domain. The auction reveals the marketplace value, and the multiple allows the federal government to take into account consumer surplus and deadweight loss aswell. There are plenty of practical issues, of course. But patent buyouts are nonetheless a stylish, information-based try to solve the issue of innovation production, and it’s been quite influential on those grounds.
3) Somewhat ironically, Kremer also offers an excellent 1990s growth paper with RCT-sceptics Easterly, Pritchett and Summers (Easterly et al. 1993). The main point is simple: growth rates by country vacillate wildly decade-to-decade. Knowing the 2000s, you likely wouldn’t normally have predicted countries like Ethiopia and Myanmar as growth miracles of the 2010s. Yet things such as education, political systems, etc are very constant within-country across any two-decade period. This necessarily implies that shocks of some kind, whether from international demand, the political system, nonlinear cumulative effects, and so forth, should be first-order for growth.
4) There is some irony that two of Duflo’s most well-known papers aren’t experiments at all. Her most cited paper by far is a bit of econometric theory on standard errors in difference-in-difference models, written with Marianne Bertrand (Bertrand et al. 2004). Her next most cited paper (Duflo 2001) is a pleasant study of the quasi-random school expansion policy in Indonesia, used to estimate the return on school construction and on education more generally. Nary a randomised experiment around the corner in either paper.
5) Kremer’s 1990s research, before his shift to development, has been incredibly influential in its right. The O-ring theory (Kremer 1993a) can be an elegant style of complementary inputs and labour market sorting, where slightly better ‘secretaries’ earn higher wages. The “One Million B.C.” paper (Kremer 1993b) notes that growth will need to have been low for some of history, and that it had been limited because low human density limited the spread of non-rivalrous ideas. It’s the classic Malthus plus endogenous growth paper.
6) Ok, yet another for Kremer, since “Elephants” is the foremost paper title in economics (Kremer and Morcom 2000). Theoretically, future scarcity increases prices. When people think elephants will go extinct, the cost of ivory therefore rises, making extinction much more likely as poaching incentives rise. How to proceed? Hold a government stockpile of ivory and invest in selling it if the stock of living elephants falls below a particular point. Elegant. And one might wonder: how do we study this specific general equilibrium effect experimentally?
Banerjee, AV (2005), ““New Development Economics” and the task to Theory”, Economic and Political Weekly 40(40): 4340-44.
Banerjee, AV, and E Duflo (2005), “Growth Theory through the Lens of Development Economics”, chapter 7 in Handbook of Economic Growth 1(A): 473-552.
Banerjee, AV, E Duflo, R Glennerster and Cynthia Kinnan (2015), “The miracle of microfinance? Evidence from a randomized evaluation”, American Economic Journal: Applied Economics 7(1): 22-53
Banerjee, AV, S Chassang and E Snowberg (2016), “Decision Theoretic Methods to Experiment Design and External Validity”, NBER Working Paper No. 22167.
Banerjee, AV, R Banerji, J Berry, E Duflo, H Kannan, S Mukerji, M Shotland and M Walton (2017a), “From Proof Concept to Scalable Policies: Challenges and Solutions, with a credit card applicatoin”, Journal of Economic Perspectives 31(4).
Banerjee, AV, S Chassang, S Montero and E Snowberg (2017b), “A Theory of Experimenters”, NBER Working Paper No. 23867
Banerjee, AV, E Breza, E Duflo and C Kinnan (2019), “Can Microfinance Unlock a Poverty Trap for a few Entrepreneurs?”, NBER Working Paper No. 26346.
Bertrand, M, E Duflo and S Mullainathan (2004), “Just how much should we trust differences-in-differences estimates?”, Quarterly Journal of Economics 119(1): 249-75.
Brooks, W, and K Donovan (2018), “Eliminating Uncertainty in Market Access: The Impact of New Bridges in Rural Nicaragua” – Kevin Bryan’s discussion here.
Derksen, L, and J van Oosterhout (2019), “Love in enough time of HIV: Testing as a sign of Risk”.
Duflo, E (2001), “Schooling and labor market consequences of school construction in Indonesia: Evidence from a unique policy experiment”, American Economic Review, 91(4): 795-813.
Duflo, E (2006), “Field Experiments in Development Economics”, in Advances in Economics and Econometrics: Theory and Applications, Ninth World Congress edited by R Blundell, WK Newey and T Persson, Cambridge University Press.
Duflo, E (2017), “The Economist as Plumber”, NBER Working Paper No. 23213.
Duflo, E, R Glennerster and M Kremer (2007), “Using randomization in development economics research: A toolkit”, Handbook of Development Economics 4: 3895-3962.
Easterly, W, M Kremer, L Pritchett and LH Summers (1993), “Good policy or all the best?: Country growth performance and temporary shocks”, Journal of Monetary Economics 32(3): 459-83.
GiveWell (2018), “Evidence Action’s Deworm the World Initiative”.
Gueron, JM, and H Rolston (2013), Fighting for Reliable Evidence, Russell Sage Foundation.
Hausman JA, and DA Wise (1976), “The Evaluation of Results from Truncated Samples: THE BRAND NEW Jersey Income Maintenance Experiment”, in S V Berg (ed.), Annals of Economic and Social Measurement 5(4), NBER: 421-45.
Kasy, M (2016), “Why Experimenters MAY NOT Always Want to Randomize, and What They Could Do Instead”, Political Analysis 1-15
Kremer, M (1993a), “The O-ring theory of economic development”, Quarterly Journal of Economics 108(3): 551-75 – Kevin Bryan”s discussion here.
Kremer, M (1993b), “Population growth and technological change: One million BC to 1990”, Quarterly Journal of Economics 108(3): 681-716.
Kremer, M (1998), “Patent Buyouts: A Mechanism for Encouraging Innovation”, Quarterly Journal of Economics 113(4): 1137-67 – Kevin Bryan’s discussion here.
Kremer, M, and C Morcom (2000), “Elephants”, American Economic Review 90(1): 212-34.
Miguel, E, and M Kremer (2004), “Worms: Identifying impacts on education and health in the current presence of treatment externalities”, Econometrica 72(1): 159-217.
Young, A (2018), “Channeling Fisher: Randomization tests and the statistical insignificance of seemingly significant experimental results”, Quarterly Journal of Economics 134(2): 557-98.