Jenny Aker on Rigor for the Rest of Us
Tuesday, June 28, 2011 at 9:08AM
Jenny Aker in RCT, baseline, evaluation, lessons learned, transparency

In her post of June 21st, Kim highlighted the (sometimes) complex world of impact evaluations and the debate over using randomized controlled trials (RCTs) as a way to conduct such evaluations.  She concluded by giving us three options:  to abandon RCTs, to use them (if we have the time and money) or to incorporate their principles into “less expensive” forms of evaluation.

Yet the focus on RCTs is somewhat of a red herring.  Those who advocate RCTs aren’t advocating for randomization per se – they are (usually) advocating for impact evaluations of development programs (or evaluations that measure the change in a development outcome that can be attributed to the specific intervention or program).  So why do we spend so much time talking about RCTs?

RCTs are often at the center of the debate on impact evaluations for a simple reason:  they can be a potentially powerful tool for measuring program impact.  Why?  Quite simply, they minimize bias – in other words, by using chance to select participants and non-participants, they increase the likelihood that program participants are as similar as possible to non-participants.  This means that, if we observe differences in outcomes between the two groups, then it is (probably) due to the program, and not to something else (which is the point of impact evaluations).  Yet RCTs are one tool among many for measuring impact, and they aren’t always feasible or appropriate.  

What do you do if you want to conduct an impact evaluation, but you can’t or don’t want to randomize?  There are plenty of options.  Here are a few key principles for those interested in impact evaluations – many of which NGOs are probably doing already.

Suppose your organization collects data on program participants’ corn yields before and after an agricultural program that sought to increase yields by 20 percent.  Corn yields were 100 kg/ha before the program, but dropped to 75 kg/ha after the program.  Did the program fail?  Maybe, maybe not.  Maybe there was a drought during this period, and participants would have been even worse off without the program.  The point is, we don’t know, because we didn’t observe what happened to non-participants. 

Now suppose you collect data on corn yields for participants and non-participants after the program, and find that yields are higher for participants.  Did the program succeed? Maybe, maybe not. It’s possible that the participant farmers were the most motivated or the richest – and so the higher yields among participants are due to those factors and not to the program. We don’t know where each group started, so we don’t know if the participant farmers were better to start off with.

By collecting data on participants and non-participants before and after the program, we can control for two important issues in impact evaluations:  1) different starting points (levels) for each group; and 2) general trends over time (which tell us what might have happened without the program), which are captured by information from the comparison group.  

Seems simple, right? If you want to follow program participants and non-participants over time, we need to know who the participants are.  In practice, though, it isn’t so simple.  Sometimes NGOs want to do a baseline first to decide who to target.  Or, perhaps the NGO will offer the program to beneficiaries, but can’t be sure that someone will accept the offer (a common issue in microfinance or savings programs).  In these cases, try to identify the treatment group at a “higher” geographic level first – such as the village or neighborhood – and collect data from individuals or households within participating and non-participating villages.

At first glance, this principle seems to contradict the whole point of RCTs – where we randomly assign villages, households or individuals to treatment and comparison groups, increasing the likelihood that the two groups will be as similar as possible before the program. 

In the absence of a RCT, how can these criteria help us?  Suppose that your organization decides to offer savings accounts to individuals with a per capita income below USD$50.  This means that if an individual earns USD$50 or less, they are a program participant – but if he/she earns USD$51 or above, they aren’t.  But how different is someone with $51 (non-participant) as compared with someone with $49 (participant)?  Not too much.  From an evaluation perspective, we could potentially compare those individuals right below the threshold (the treatment group) with those right above the threshold (the comparison group), assuming that they aren’t too different.

One of the main criticisms levied against impact evaluations – and RCTs in particular – is that they provide us with the “what” (did the program have an impact?) but not the why (if it did have an impact, through what channels?).  Yet there is nothing inherent in impact evaluations that prevents us from learning about the channels of impact or from using qualitative techniques.  At the end of the day, impact evaluations should tell not only tell us whether the program worked, but also why it worked (or didn’t). 

Suppose you want to pilot a new savings group model in Mali where group members receive SMS reminders to save, as compared with groups that don’t receive reminders.  You think that those groups that receive reminders will remember to save and save more, hopefully allowing them to invest or build their assets.  So we would like to collect data on household (or individual) investments and assets (the “what”), as well as their savings and whether they used the SMS reminders (why). We could also ask individuals whether they like the reminders, or why they were unable to save.  Combining data on outcomes from multiple levels, as well as a combination of qualitative and quantitative techniques, can help us to better understand the impact of the pilot program and why.

It’s human nature: We want to share our successes and perhaps hide our failures.  But by only sharing our success stories (programs that worked) and hiding our failures, we are losing an opportunity to learn.  At best, this means that another NGO repeats the same program somewhere else, wasting and resources.  At worst, this “waste” prevents scarce resources from being used in another context or program that deserved it more, or encourages clients or poor households to waste their scarce time or resources on something that doesn’t work.

Bottom line:  If we’re going to do impact evaluations, we all need to do a better job of sharing our results – with clients, communities, NGOs, donors and governments, successes and failures. Of course this might be easier said than done – but it should be a principle nonetheless. 

Jenny Aker is an Assistant Professor of Development Economics at the Fletcher School, Tufts University. She was previously Deputy Regional Director, Programming, for CRS in West Africa where she oversaw CRS’s microfinance programming.

Article originally appeared on Savings Revolution (
See website for complete article licensing information.