Ken Bank's recent post "Social Mobile and the Missing Metrics" really started my wheels turning. Among many other things, it started me thinking about how we could be smarter in implementation so as to facilitate true impact evaluation.

The first thing that came to mind is Oportunidades, formerly known as Progressa, the famous social assistance program in Mexico that made cash transfers contingent on school attendance and visits to health clinics.  When Opportunidades rolled out, they couldn't scale immediately due to limited resources.  So what they brilliantly did was randomized recipients, which enabled rigorous evaluation of the program's impact.  And so I thought, why can't we scale up mobile technologies (and other interventions for that matter) in the same way?

Surely we can, but as soon as I put more thought into it, I realized there are inherent limitations.  Any well run organization will put some serious thought into how it scales up its efforts.  The decision of where to go next will likely be based on criteria such as where the problem it is addressing is most acute, where partner organizations are operating to ease implementation, where donors want them to go, where talent is available, etc etc etc.  The greater the resource constraints, the more important it becomes to be strategic in expansion decisions.  That's smart management.  

Unfortunately, it also creates a real problem for rigorous evaluation.  Since the "treatment" areas are chosen based on systematic criteria, this creates selection bias.  If you wanted to understand how microfinance effects a typical rural village you'd have to randomly assign microfinance access to a set of typical rural villages.  But MFIs don't expand by dropping pins on the map, they'd choose villages based on criteria such as income level.  So what's the way out? How can we think about evaluations in a world that for very good reason does not target the average of its expansion opportunities?

It doesn't make sense that organizations would randomize expansion in order to facilitate evaluation, especially where resources are limited.  What does make sense though, it that they could randomize expansion in some way among the subset that does meet the criteria.  For example, when FrontlineSMS:Medic expands to a new partner, they could roll out the technology to a subset of clinics and wait 6-12 months to roll it out fully, evaluating its impact in the interim.  This wouldn't give us an answer to how the technology effects health in the developing world at large, but it would give us an answer to the technology's impact in places where partners showed interest in utilizing SMS to improve rural healthcare. 

Bottom line being that if we're going to get smarter about random evaluation in the real world, it's going to be constrained by the way the real world actually operates.  We might not find answers to how our interventions impact the world at large, but at least we'll start to have answers to how our interventions impact the world we focus on.