So What do I take Away from The Great Evidence Debate? Final thoughts (for now)

evidenceThe trouble with hosting a massive argument, as this blog recently did on the results agenda (the most-read debate ever on this blog) is that I then have to make sense of it all, if only for my own peace of mind. So I’ve spent a happy few hours digesting 10 pages of original posts and 20 pages of top quality comments (I couldn’t face adding the twitter traffic).

(For those of you that missed the wonk-war, we had an initial critique of the results agenda from Chris Roche and Rosalind Eyben, a take-no-prisoners response from Chris Whitty and Stefan Dercon, then a final salvo from Roche and Eyben + lots of comments and an online poll. Epic.)

On the debate itself, I had a strong sense that it was unhelpfully entrenched throughout – the two sides were largely talking past each other,  accusing each other of ‘straw manism’ (with some justification) and lobbing in the odd cheap shot (my favourite, from Chris and Stefan ‘Please complete the sentence ‘More biased research is better because…’ – debaters take note). Commenter Marcus Jenal summed it up perfectly:

‘The points of critique focus on the partly absurd effects of the current way the results agenda is implemented, while the proponents run a basic argument to whether we want to see if our interventions are effective or not. I really think the discussion should be much less around whether we want to see results (of course we do) and much more around how we can obtain these results without the adverse effects.’

There were some interesting convergences though, particularly Whitty and Dercon’s  striking acknowledgement of the importance of power and politics, which are often assumed to be excluded from the results agenda. But what they actually said was

‘Understanding power and politics and how to assist in social change also require careful and rigorous evidence.’

True, but what about reversing the equation? Does understanding the role of evidence in development also require a careful and rigorous understanding of power and politics? They never fully address that crucial point, which is at the heart of Roche and Eyben’s critique.

correlation v causation cartoonBoth sides (rather oddly, as acknowledged experts in their fields) decried the role of experts. Whitty and Dercon called for ‘moving from expert (i.e. opinion-based, seniority-based and anecdote-based) to evidence-based policy’. Ah, turns out that what is actually being suggested is a move from one kind of expert (practitioners) to another (evidence/evaluation).

As a non number-cruncher I also took exception to their apparent belief that only those who understand the methodological intricacies of different evaluation techniques are eligible to pass judgement. On that basis politicians would be out of a job, and only rocket scientists would get to pronounce on Trident.

There was also a really confusing exchange on the hierarchy of evidence. Whitty and Dercon show a surprising (to me at least) commitment to multi-disciplinarity: ‘Methods from all disciplines, qualitative and quantitative, are needed, with the mix depending on the context….. it is not a matter of just RCTs, but of rigour, and of combining appropriate methods, including more qualitative and political economy analysis.’

Music to the ears of the critics, but is it actually, you know, true? Everything I hear from evaluation bods is that DFID does actually see RCTs as the gold standard, and other forms of evidence as inferior. Roche and Eyben returned to the attack on this in their response, arguing that what Whitty and Dercon call the ‘evidence-barren areas in development’ are only barren if you discount sociology and anthropology, among others, as credible sources of evidence. By the way, Ed Carr has a brilliant new post on the (closely linked) clash between quants and quals, arguing that while quants can establish causation, only quals can explain how that causation occurs.

But the exchange did provide me with one important (I think) lightbulb moment. It was about failure. Whitty and Dercon were particularly convincing on this: the evidence agenda ‘involves stopping doing things which the expert consensus agreed should work, but which when tested do not’. This is a nice Popperian twist – the role of evidence is not to prove that things work, but to prove they don’t, forcing us to challenge received wisdom and standard approaches. This is indeed what I noticed about Oxfam’s recent ‘effectiveness reviews’ – if you find no or negative impact, then you (rightly) start to re-examine all your assumptions. But if this is the proper role for the evidence agenda, is it politically possible? By coincidence I have just read Ed Carr’s forceful critique of Bill Gates’ approach to evaluation, arguing that failure is often airbrushed out in order to safeguard funding and credibility. That seems a pretty fundamental contradiction.

The comments were just as thought-provoking. One of the key messages that emerged is the gulf between these debates and what those in complexity signcharge of gathering results in aid agencies actually face – highly constrained resources, crazy time pressure, and the need to deliver some (any!) results to feed the MEL machine. Oxfam’s Jennie Richmond reflected on the gap between theory and practice yesterday.

Commenter Enrique Mendizabal asked whether we are demanding a different role for evidence in poor countries than in our own.

‘In the UK, health policy is decided by a great many number of factors or appeals (evidence, sure, but also values, tradition, biases, political calculations, etc). We may complain about it but we accept that it is a system that works. But health policy for Malawi (or other heavily Aid dependent countries) is decided mainly by evidence (or what often passes as evidence at the time) and usually by foreign experts…. would we be happy with USAID funding a large evidence-based campaign to reform the NHS or our education policy?’

But he took his argument a step further – if the final decision should be left to the interplay of evidence (of different sorts), politics and negotiation, then DFID and other donors would be better advised to boost the ‘enabling environment’ for such debates and decisions by investing in tertiary education in developing countries:

‘strengthening economic policy debate is a more adequate objective than achieving policy change (even if it is evidence based).’

Commenter David highlighted a fundamental point that rather went missing in the initial exchange – how the results agenda does or doesn’t work in complex systems:

‘The results agenda approach tends, by presenting development as objectively knowable if broken down into discrete and small bits, todrive attention toward small, more easily measurable interventions to test, particular those that are suited to situations that are simple or complicated rather than complex. Current processes around evidence-based results fail to grapple with complex systems, interaction effects, and emergent properties that dominate most aid project landscapes.

A fundamental critique of the evidence-based revolution is that it actually diminishes efforts to get rigorous evidence about addressing complex challenges. We all want evidence, it’s a question of whether the current framing of “evidence-based” is distorting what types of evidence we gather and value. For those who think that the current emphases on methods to test what works are distorting how we value the evidence coming in (RCT=gold, qualitative methods=junk), this offers little other than platitudes about lots of other methods existing.

Personally, I would be a bigger proponent of the evidence-based revolution if it was coming to folks interested in power, politics, and development, and asking them what their questions are and what evidence might contribute to their work. Absent a learning agenda set to fit complex space and concern itself with power, it will continue to seem to me to be an instance of methods leading research – or searching for keys under the light rather than inventing a flashlight.’

To be fair, Roche and Eyben explicitly chose to focus on the politics of evidence, rather than the implications of complex systems (for example, the question of external validity in complex systems – or lack of it – raised by Lant Pritchett in our recent conversation.)

Final thoughts? After about 500 votes, the poll went narrowly to Whitty and Dercon (34% v 31% for Roche and Eyben, with a pleasing late rally for the ‘totally confused’ camp – my natural habitat). I think Chris Roche and Rosalind Eyben need to work on their communication style (more punchy, less abstract, more propositional). Chris Whitty and Stefan Dercon should give some examples of gold standard anthropological or sociological evidence to allay the doubts over their true commitment to multi-disciplinarity, and take the complex systems question more seriously.

A massive thankyou to all who took part, and please can you come back for another go in a year or so? This one isn’t going away.

February 7th, 2013 | 12 Comments

Theory’s fine, but what about practice? Oxfam’s MEL chief on the evidence agenda

Two Oxfam responses to the evidence debate. First Jennie Richmond, (right) our results czarina (aka Head of Programme PerformanceJennieRichmond and Accountability) wonders what it all means in for the daily grind of NGO MEL (monitoring, evaluation and learning). Tomorrow I attempt to wrap up.

The results wonkwar of last week was compelling intellectual ping-pong. The bloggers were heavy-hitters and the quality of the comments provided lots of food for thought. However, I was left wondering what it all meant for those of us who work in NGOs, trying to generate and learn from ‘evidence’ on a daily basis. I found myself unable to simply vote, so instead I blog….

The results and evidence agendas have brought some real benefits to NGOs in my view. First and foremost, it is important and right that those of us who claim to work in the interests of the poorest people in the world and are stewards of other people’s money, should set ourselves high standards for our own impact. In its simplest form the results agenda asks us to justify the trust others have placed in us, by demonstrating whether we are actually bringing about positive change. In Oxfam GB, accountability has long been held as a core organisational value. It is not the results and agenda that has got us thinking about how to capture and communicate our effectiveness, but it has provided a helpful additional push.

A further positive is that space has been created both within our own organisations and in the wider sector, to stop, listen and learn. MEL-istas (as Duncan calls us) 5 years ago struggled to get the ear of senior managers (let alone Ministers). But the results agenda has increased the stakes around MEL – encouraging organisations not only to increase investment, but also to listen to the findings coming from our own data gathering and analysis.

However, it has also increased the demand and the expectation, which are not easily met by all NGOs. In Oxfam GB the investment in MEL has increased over the last couple of years, undoubtedly, but still it is a real stretch to deliver the ever-more ambitious demands from donors, to develop tools to tell the story of our broader organisational impact, and to ensure that we are developing innovative ways of measuring cutting-edge programming areas, such as resilience, enterprise development and influencing.

And we are one of the largest international development NGOs in the UK. How much more difficult for the smaller and niche NGOs, or those who lack the flexible financing that permits investment in MEL and innovation? We are conscious in Oxfam that we and other large NGOs need to guard against distorting the NGO market place by pushing the boundaries on MEL and impact too far, and thereby creating expectations that cannot be met by everyone. Somehow we all need to keep our sights on a proportionate approach.

cartoon-evaluation_cultureIt is not just important to generate evidence, but also to use it properly. There is increased demand for serious, evidence-based conversations about what works.  None of us can get away with decisions made purely on gut instinct, force of habit or ideological leaning. We are challenged by the ‘evidence’ question to collate and distil from the broad knowledge base we have at our disposal. And this has in some cases led to surprises. Rigorous studies, whether based on qualitative or quantitative methods, can challenge our preconceptions – showing us impact where we were not optimistic, or the opposite. The test, of course, comes when new programmes are designed. Will the body of evidence be applied – will we be able to find it for starters (in our often not-so-state-of-the-art knowledge management systems), and will it be politically acceptable in our own organisations to apply it to practice?

So, how can we use the results and evidence agendas and make them useful to us as NGOs?  We need to do this in a way that a) is true to the actual work we do (which in the case of Oxfam includes a great deal of work that drives for political change and influencing) and b) does not distort decision-making away from the right decisions (i.e. what most suits the specific needs and opportunities of each context) in our efforts to be able to measure and communicate what we are doing.

One of the concerns raised in last week’s blog was that in some institutions, evidence becomes synonymous with impact evaluations, and even specifically with Randomised Control Trials. As all the bloggers agreed, the default use of one research method for interventions of all types is simply nonsensical. You only have to look at the enormous variety of the things we do in international development (from campaigning for policy change to delivery of bed-nets, from building of bridges to raising awareness of the rights of citizens) to realise that one approach is just not going to cut it.

Another challenge is that so much of what we do in international development is extremely hard to measure. How can we trace the input through to impact chain and clearly demonstrate the ‘on the ground’ changes we have brought about in people’s lives when the investment is in budget support or core funding?  How can we reduce the process of a community standing up against acts of violence against women to a Value for Money calculation? The ethical dilemmas and practical difficulties wrapped up in measuring and ‘evidencing’ many of the processes we are involved in are huge. And, as Eyben and Roche point out, much of what we engage with in international development is messy and political. We need to make sure that the tools we have at our disposal for evidence generation are sophisticated and nuanced enough to acknowledge this messy political reality, and that we are sharing ideas on how to do this in a practical and affordable way.

The push for evidence should go hand in hand with a more entrepreneurial approach to development, opening up space for honest

MEL that - US military mindmap of Afghanistan

MEL that - US military mindmap of Afghanistan

reflection on both success and failure. That is the theory. But, of course, there are obstacles to this becoming a reality. Our systems in large institutions, including NGOs, are designed to demonstrate success. We all have our logframes and our KPIs, and we want to be able to put a tick in the box. No-one wants their project to be the one famous for not achieving what it set out to do, even if the real story is that it helped enormously to generate learning for future projects. Complexity thinking is having some influence right now, which helps to raise the right questions about process and incentives. However, we have a long way to go before even in the most reflexive learners in NGOs and other development institutions want their project to be hailed as the great failure.

So, we proceed with caution – welcoming the increased space the Results Agenda provides to consider ‘what seems to work’, and the profile it gives to the need to take a thorough and transparent look at the information coming out of our programmes. But, wary of the dangers of distorting what we do in order to make it measurable; of placing the MEL ‘bar’ for NGOs too high to reach; of the over-emphasis of certain methodologies; and of the danger of ignoring political realities in the work that we do. It is certainly helpful to keep reflecting and questioning, however, from all sides of the debate – so the wonkwar of last week was welcome.

February 6th, 2013 | 4 Comments

The evidence debate continues: Chris Whitty and Stefan Dercon respond from DFID

whitty_christopherYesterday Chris Roche and Rosalind Eyben set out their concerns over the results agenda. Today Chris Whitty (left), DFID’s Director of Research and Evidence and Dercon, StefanChief Scientific Adviser and Stefan Dercon (right), its Chief Economist, respond.

It is common ground that “No-one really believes that it is feasible for external development assistance to consist purely of ‘technical’ interventions.” Neither would anyone argue that power, politics and ideology are not central to policy and indeed day-to-day decisions. Much of the rest of yesterday’s passionate blog by Rosalind Eyben and Chris Roche sets up a series of straw men, presenting a supposed case for evidence-based approaches that is far removed from reality and in places borders on the sinister, with its implication that this is some coming together of scientists in laboratories experimenting on Africans, 1930s colonialism, and money-pinching government truth-junkies. Whilst this may work as polemic, the logical and factual base of the blog is less strong.

Rosalind and Chris start with evidence-based medicine, so let’s start in the same place. One of us (CW) started training as the last senior doctors to oppose evidence-based medicine were nearing retirement. ‘My boy’ they would say, generally with a slightly patronising pat on the arm, ‘this evidence-based medicine fad won’t last. Every patient is different, every family situation is unique; how can you generalise from a mass of data to the complexity of the human situation.” Fortunately they lost that argument. As evidence-informed approaches supplanted expert opinion the likelihood of dying from a heart attack dropped by 40% over 10 years, and the research tools which achieved this (of which randomised trials are only one) are now being used to address the problems of health and poverty in Africa and Asia.

The consequences of moving from expert (ie opinion-based, seniority-based and anecdote-based) to evidence-based healthcare policy, far from being some sinister neocolonial experiment, have been spectacular. To quote a recent Economist headline, ‘Africa is currently experiencing some of the fastest falls inOxfam africa campaign childhood mortality ever seen, anywhere’. It is a great example of the positive side to modern Africa the current excellent Oxfam publicity campaign (right) is all about. This success is based on many small bits of evidence, from many disciplines, leading to multiple incrementally better interventions. Critically, it also involves stopping doing things which the expert consensus agreed should work, but which when tested do not. It is no accident that one of the most evidence-based parts of development is also one where development efforts have had some of their greatest successes.

Proper evidence empowers the decision-maker to be able to make better choices. This is a good thing. In every discipline, in every country, where rigorous testing of the solutions of experts has started, many ways of doing things promoted by serious and intelligent people with years of experience have been shown not to work. International development is no different, except that the communities we seek to assist are more vulnerable, including to our bad choices.

Much of what we all do in international development has very limited evidence that it does any good  (in this it is no different from many other policy areas) – which is not the same as saying it is pointless. Rather we don’t know what is pointless. Some of our actions will work better than we think, much of it will work much less well than we hope, and some of it will be damaging the poorest without us realising it. In the evidence-light areas we just don’t know which are which.

We must have the humility to accept that we are all often wrong, however reflexive the practitioner, however deep their reading and experience and passion to do good. Evidence-based approaches are not about imposing a particular theory or view of the world. It is simply about taking any opportunity to test our own solutions in the best way available, using evidence honestly when it is available to inform (note the word) decisions, and when the facts change, changing our minds.

This honesty includes saying to decision-makers when evidence is methodologically weak, mixed or missing so they know they are on their own, unable to rely on (or make a claim on) the evidence. The worst possible solution, which we know Chris and Ros would also deplore, is using the social power of the ‘expert’ to imply we know the answer when we actually have no solid evidential basis for our opinion or prejudice.

A few false assumptions about evidence-based decision making

Some of those who express unease about evidence-based policy and practice seem to assume that it is always based on randomised trials and quantitative methodologies: not so. Methods from all disciplines, qualitative and quantitative, are needed, with the mix depending on the context. Randomised trials are one tool amongst very many, although a good one in the right setting. The argument that evidence-based approaches can “only apply in cases of individual treatment and not the wider community level” ignores over 30 years of methodology which has done exactly that, with very convincing results.

A sterile argument  between people who are on the one side believe that a  randomised trial can answer any question (they can’t), and people who do not appear to be aware of any  methodological advances since the 1970s except in their own narrow field is a depressingly familiar experience. We know this does not apply to Rosalind and Chris, but listening to people passionately critiquing methodologies they have not taken the trouble to understand does no good to anyone. This applies both to a randomista who seems to believe that all there is to social research is a few focus groups and in-depth interviews, and to people from a more qualitative social science background who would have trouble explaining the difference between cluster randomised and step-wedge design but assume both are irrelevant to social research anyway (both can be used to measure societal rather than individual effects).

It is tempting to take every point the authors make where we have concerns about their factual basis and logical framework but we will take just three.

“Evidence-based approaches are pre-occupied with avoiding bias and increasing the precision of estimates of effect”. On less bias – generally true. Please complete the sentence ‘More biased research is better because…’. On precision – no, incorrect, the range of situations where a more precise answer is a better answer is small.

One statement we would like to address head-on starts “Evidence-based approaches became linked to value for money concerns to deliver ‘results’…”. We agree- and this is a good thing. Doing a pointless thing, professionally delivered and passionately believed in, is always going to be poor value for money. Testing what works and what does not therefore is essential to value for money. More importantly, doing pointless things diverts very limited human and financial resources, in an ocean of need, away from those who could best use them- not what any of us are in international development to do.

Is it “technical approaches” on the one hand, and “power, political economy” analysis on the other?

Rosalind and Chris’ key criticism is that evidence-based approaches “deflect attention from the centrality of power [and] politics […] in shaping society”, and they offer “power analyses” as an apparent alternative to assessing rigorously what works. This creates a false dichotomy, as if a choice has to be made between a “technical, rational and scientific approach to development” and an approach that recognises politics and the role of power. It is easy rhetoric, but troubling and, if taken much further, even dangerous. Understanding power and politics and how to assist in social change also require rural indiacareful and rigorous evidence, and again, results are not simply what experts would have expected a priori. Recent studies on the positive impacts of female leadership quotas in rural India are for many of us rather surprisingly good news, even if one can fairly worry about its applicability in other settings, while the struggle to find systematically a positive impact of decentralisation and community-driven development programmes is important to internalise in our actions for change, and highlights the importance of understanding contexts and politics. In these cases, it is not a matter of just RCTs, but of rigour, and of combining appropriate methods, including more qualitative and political economy analysis.

Strong analysis of politics and power without offering much in terms of what can be acted upon is similarly unhelpful. They criticise an evidence-focused agenda by stating that “to act ‘technically’ in a politically complex context can make external actors pawns of more powerful vested interests and therefore by default makes them, albeit unintentionally, political actors.” But all actions by external actors will interact with political forces and vested interests. In many of the settings where development actors want to make a difference, power and political institutions are biased against the poor. Being able to act on strong evidence of what works in constrained political settings is crucial.

A reductionist and misinformed view of evidence as purely ‘technical’ or as being only about “what works” is unhelpful – it is also about generating evidence and understanding (and learning) on why interventions and approaches may work, including understanding the social, political, and economic factors that may enable or constrain success of different approaches. Far from the search for evidence pushing us in a ‘technical’, apolitical direction it has reinforced the importance of understanding and trying to tackle the underlying causes of poverty and conflict. There is agreement on the importance of politics and institutions in shaping growth, security and human development. However, the ability of external actors to influence institutions is much less clear and this is where DFID research is now focussed. Ros and Chris have misread the context – the commitment to evidence has opened up the space fundamentally to challenge conventional, technical approaches to aid.

Why it matters for international development

There are large areas of international development where decision-makers are largely flying blind – forced to make decisions purely on gut feeling and ideology not because they wish to because they have no option. Try making difficult decisions in education policy compared to health policy and the difference in usable evidence is dramatic – yet both are complex, social and context-dependent parts of human life. It is always puzzling when people say airily ‘health is easy’- it is not, and is an intensely political and social subject requiring interventions at societal level.

Today we can eradicate rinderpest in cattle and build bridges over the Zambezi based on rock-solid evidence from many disciplines, but do not have anywhere near as clear an idea how to reduce violence against women or tackle police corruption. All are great challenges with social dimensions but in two of them people have set about finding and testing solutions in a systematic way over many decades.

Having robustly tested evidence-based solutions certainly does not eliminate politics: the decision whether to build a bridge, what sort and where, is an intensely political choice – but at least those making the choice now have a fair assumption it will stand up- based on hundreds of years of incremental evidence. The evidence-barren areas in development are a collective, and in our view shameful, failure by us all in the academic and practitioner community. We should never excuse them with the feeble assertion that it is too difficult or complicated. Development is difficult and complicated – but the bases for making decisions will gradually improve if we are serious about improving it.

In conclusion, we collectively have the capacity to be able to give to our successors in every continent a far better basis on which to makeevidence based change placard their decisions for their lives than our generation have. To imply it is not worth trying to provide the best and most rigorous evidence to those who need to make difficult decisions because they will have other influences as well is like saying to someone going for a walk in dangerous mountains that they do not need a map because there will be many other factors that will determine where they go. That is true – but they are still less likely to fall off the cliff if they have one.

Where evidence is clear-cut we should be making that plain to decision makers – and where it is not we should say that as well, be honest about what is there and try to get better evidence for the future. That, in essence, is what evidence-based decision making is about – and all it is about. If the academic community is serious about trying to assist those working in the field (including in Oxfam), and above all empowering the most vulnerable communities to make the most informed possible decisions available for their own development, we should be putting our greatest efforts into supporting decision-makers to use the best evidence, and finding better methodologies in areas where we currently have very weak evidence. There are many, and this should be tackled as a matter of urgency.

Tomorrow, Chris Roche and Rosalind Eyben respond

January 23rd, 2013 | 22 Comments

Lant Pritchett v the Randomistas on the nature of evidence – is a wonkwar brewing?

Last week I had a lot conversations about evidence. First, one of the periodic retreats of Oxfam senior managers reviewed our work on livelihoods, humanitarian partnership and gender rights. The talk combined some quantitative work (for example the findings of our new ‘effectiveness reviews’), case studies, and the accumulated wisdom of our big cheeses. But the tacit hierarchy of these different kinds of knowledge worried me – anything with a number attached had a privileged position, however partial the number or questionable the process for arriving at it. In contrast, decades of experience were not even credited as ‘evidence’, but often written off as ‘opinion’. It felt like we were in danger of discounting our richest source of insight – gut feeling.

In this state of discomfort, I went off for lunch with Lant Pritchett (right – he seems to have forgiven me for my screw-up of a couple oflant pritchett years ago). He’s a brilliant and original thinker and speaker on any number of development issues, but I was most struck by the vehemence of his critique of the RCT randomistas and the quest for experimental certainty. Don’t get me (or him) wrong, he thinks the results agenda is crucial in ‘moving from an input orientation to a performance orientation’ and set out his views as long ago as 2002 in a paper called ‘It pays to be ignorant’, but he sees the current emphasis on RCTs as an example of the failings of ‘thin accountability’ compared to the thick version.

In a forthcoming paper (which I will definitely link to when it’s published), Lant defines thick accountability as ‘an “account” in the sense of a justificatory narrative of my actions, the story of my actions I tell to those whose opinion of me is important (including myself, but including family and kinsmen, friends, co-workers, co-religionists, people I respect and desire admiration from) that explains why my actions are in accord with, and deserving of, a positive view of myself.    In contrast, thin accountability is “accounting”, which is that small part of the account about which objective facts can be established.’  He sketched out the inevitable 2×2 matrix for me

Thin accountability

Low performance

e.g. fragile states

Thin accountability

High performance

e.g. post office and road-building

Thick accountability

Low performance

e.g. families and other non-performance oriented institutions

Thick accountability

High performance

e.g. just about any complex institutional ecosystem

The challenge in most development work is to move from top left to bottom right. There are occasions when thin accountability/high performance works – typically routine functions like delivering mail or building roads. But anything involving the messiness of people and institutions requires thick accountability, involving deep bonds of trust and reciprocal relationships that are likely to be defined by a setting’s unique history and geography – what he calls ‘folk practices, from which formal organizations can (re)emerge’.

He argues that the randomistas just don’t get this. His critique of RCT culture ranged pretty wide:

  • The politics of RCTs: ‘RCTs are a tool to cut funding, not to increase learning.’  ‘Randomization is a weapon of the weak’ – a sign of how politically vulnerable the argument for aid has become since the end of the Cold War. ‘Henry Kissinger wouldn’t have demanded an RCT before approving aid to some country.’ And I can’t see the military running RCTs to assess the value for money of new weaponry before asking for more cash (mind you, if they did, that might at least save some money on Trident….).
  • The lack of interest in theory: ‘the randomistas are going back to alchemy – atheoretic experimentation’.
  • RCTs test at most a few project variants using ‘project vs non-project’, whereas interventions are typically multiple, overlapping and synergistic (i.e. the whole cannot be reduced to a sum of parts).
  • No-one evaluates the evaluators. At the very least, given how much RCTs cost, you need to know that the findings are useful elsewhere (so-called ‘external validity’). But once you have multiple RCTs on the same issue (and their spread is starting to produce such comparable studies), you find very little external validity – the results of an RCT in one country and time are not replicated elsewhere (with the possible exception of deworming in schools, but even that iconic RCT story is contested). This is the big contrast with real science, where replicability is a key condition of validity.
Patronising? Overpromising? Nah....

Patronising? Overpromising? Nah....

In another recent paper, he argues instead for ‘structured experiential learning’, which involves rigorous and intelligent conversation, rather than the illusory certainty of numbers. Get people in a room, agree what the problem is, agree to try out some experiments to solve the problem, and set up rapid feedback to identify failure and/or build on success. In another recent paper, he calls this ‘Problem Driven Iterative Adaptation (PDIA)’. It sounds very similar to the conclusions of the Africa Power and Politics Programme, which I reviewed recently. In yet another paper (he’s horribly prolific), he also draws a neat distinction between experiments and experimentation:

‘Perhaps surprisingly, the experimentation and experiments approaches are not at all the same. I argue that experiments, while a terrific method for generating PhD dissertations and published papers, will have impact on development and development practice only insofar as they are embedded in an experimentation approach (which they are often not).’

The feeling I got from these conversations was of two tribes encamped and preparing for battle. That line from Henry V comes to mind: ‘from camp to camp, through the foul womb of Boston night, the hum of either army stilly sounds.’ On one side are the ‘best fit’ institutionalists and complexity people, with their focus on path dependence, evolution and trial and error. On the other are the ‘universal law’ experimentalists, offering the illusory certainty of numbers, and (crucially) comfort to the political paymasters seeking to prove to sceptical publics that aid works. It’s hard to see how they can both be right, or happily coexist for long. Time for a wonkwar on this blog, I think…..

November 21st, 2012 | 18 Comments

Getting evaluation right: a five point plan

Final (for now) evaluationtastic installment on Oxfam’s attempts to do public warts-and-all evaluations of randomly selected projects.jyotsna puri This commentary comes from Dr Jyotsna Puri, Deputy Executive Director and Head of Evaluation of the International Initiative for Impact Evaluation (3ie)

Oxfam’s emphasis on quality evaluations is a step in the right direction. Implementing agencies rarely make an impassioned plea for evidence and rigor in their evidence collection, and worse, they hardly ever publish negative evaluations.  The internal wrangling and pressure to not publish these must have been so high:

  • ‘What will our donors say? How will we justify poor results to our funders and contributors?’
  • ‘It’s suicidal. Our competitors will flaunt these results and donors will flee.’
  • ‘Why must we put these online and why ‘traffic light’ them? Why not just publish the reports, let people wade through them and take away their own messages?’
  • ‘Our field managers will get upset, angry and discouraged when they read these.’
  • ‘These field managers on the ground are our colleagues. We can’t criticize them publicly… where’s the team spirit?’
  • ‘There are so many nuances on the ground. Detractors will mis-use these scores and ignore these ground realities.’

accountability cartoonThe zeitgeist may indeed be transparency, but few organizations are actually doing it.

So while Oxfam’s results are interesting, more importantly the transparent process must be applauded. But as I read these documents, it was deja vu… In the initiatives that used quasi-experimental methods I was struck by Oxfam’s acknowledgement that they didn’t know the ‘why’ of some of the results. For the ones that used qualitative methods (the humanitarian portfolio, citizen’s voice and policy influencing), I kept asking myself, how much did they do better by? It seemed like a zero-sum game: One method meant the absence of the other.

This was one source of familiar dissatisfaction…

As they say, once a ship has sunk, all the mice know how it could have been saved.

So here’s the mouse in me. What can an organization do to answer questions I (and it) have and not wring its (collective) hands regretfully later? Here’s my five point list for what all NGOs should think about before setting up an M&E system (or even after setting it up). It’s operational (I have put one into place), it’s not easy, but it has the potential to quieten most detractors (and people like me):

Point 1: Have a good theory of change/causal pathway/impact pathway or whatever you want to call it. The name doesn’t matter (it’s a rose!)

Theories of change are good for understanding the program, for schematics and great communication tools too. Additionally anHaiti reconstruction cartoonevidence-based theory of change can help you decide where you need most investigation, where a process evaluation is sufficient, where a counterfactual analysis of outcomes is required and where a simple tracking of indicators is useful.

Do: Set one up and ensure everyone who needs to, knows the theory of change along with risks and assumptions.

Point 2: Put in place monitoring and information systems. Track process and process/output and some outcome indicators across program areas. There should be a list of performance monitoring indicators that speak to different sectors (four in the case of Oxfam).

Do: Put together a set of standard operating procedures for collecting information on process indicators. This should contain information on frequency of collection, identify data sources (clinics, households, schools), specify respondents (teachers, nurses, women, children…) and clearly elucidate methods for calculating indicators (even for simple indicators such as enrollment rates).

Do: Write and revise and revise a standard operating procedure manual till you have it pat.

Do: Have a management information system that also includes algorithms for quality checks and have a full time person doing data review.

Do: Train your data collectors and your data base managers;

Measuring babyPoint 3: Think about measuring attributable change. Can you for instance:

-          Assign the intervention randomly from the beginning without losing sight of your final goal?

-          Identify counterfactual sites and start collecting data there? Pros: great reporting to donors; rigorous information; Cons: more expensive than just monitoring data, does require high level of scrutiny in comparison sites especially if you use ex post techniques.

-          Use other methods to establish causality? (Which ones?)

For all methods:

Do: Use protocols and register them (3ie will soon start to register them.)

Do: Use rigorous surveys in implementation sites and in control sites (and get someone who knows how to do them. Don’t do them yourself).

Do: Have standard operating procedures for site level data entry and cleaning;

Do: Use anthropometric measures and bio-physical indicators to the extent possible;

Do: Use and write a field operations manual, write standard operating procedure manuals for data managers that contain range and logic checks for data, and, encourage double data entry.

Point 4: Undertaking cost and cost effectiveness studies. What are the priced and non-priced inputs in the project? Think about whether you want to use these projects in other places? Scale them up? (And no it’s not going to be calculated from your budget statements alone. )

Do: Put together a standardized template with cost categories and measurement methods. (E.g. how will you measure the cost of usingtrapped in rubble cartoongood seeds for the farmer? It’s not just the cost of procurement or transportation but also the cost of additional manure, the cost of storage for seed and post-harvest produce.)

Do: Ensure that everyone in the delivery chain understands and sees this template the same way. (Train, train, train…train!).

Point 5: Focus on implementation research. Systematically documenting implementation factors, and putting together a protocol which contains questions that are relevant to informing all stages of the evaluation. This is where participatory methods, focus groups, observational scrutiny, process research should come in, and also inform your theory of change.

Do: set out a protocol at the beginning that lays out i) the questions you want answered ii) what you’ll ask in your interviews to answer them; iii) a plan for analyzing your qualitative information.

There are many more things one can do. But I believe if you have these covered, you are on your way.

A few more things to bridge that elusive evidence-policy gap:

  • Evidence is required for policy making but most policy makers are looking to affirm (and not inform) their opinions (as a recent article in Time says. See here for an excellent QJPS article also cited there).
  • Be circumspect about what evidence you advocate for. Not everything is worth fighting for (and often leads to evidence-fatigue.) When I have taught policy analysis, I have often used a rule of thumb long known to academic political scientists: if a policy change leads to less than a 10% change in outcome, it’s a flashing red (stop and think before translating that evidence into policy); if it’s a 10-25% change (it’s a lime, go for it but think about transition costs); if it’s more than 25% change, it’s a deep, loud green: Adopt the policy. The costs of transition will be surpassed by the benefits of policy change.
  • Change the institutional incentives: Oxfam is on its way, but will program managers on the ground really adopt this culture change or will it continue to be top down? (See here for an excellent blog by Mead Over and Martin Ravallion.)
October 25th, 2012 | 2 Comments

What do DFID wonks think of Oxfam’s attempt to measure its effectiveness?

More DFIDistas on the blog: this time Nick York, DFID’s top evaluator and Caroline Hoy, who covers NGO evaluation, comment on Oxfam’s publication of a set of 26 warts-and-all programme effectiveness reviews.

Having seen Karl Hughes’s 3ie working paper on process tracing and talked to the team in Oxfam about evaluation approaches, Caroline Hoy (our lead on evaluation for NGOs) and I have been reading with considerable interest the set of papers that Jennie Richmond has shared with us on ‘Tackling the evaluation challenge – how do we know we are effective?’.

From DFID’s perspective, and now 2 years into the challenges of ‘embedding evaluation’ in a serious way into our own work, we know how difficult it often is to find reliable methods to identify what works and measure impact for complex development interventions.  Although it is relatively well understood how to apply standard techniques in some areas – such as health, social protection, water and sanitation and microfinance – there are whole swathes of development where we need to be quite innovative and creative in finding approaches to evaluation that can deal with the complexity of the issues and the nature of the programmes.  Many of these areas are where NGOs such as Oxfam do their best work.

So we would really like to welcome and applaud Oxfam’s new Effectiveness Reviews, which adopt a clear and practical framework for assessing what difference it is making, through its partners, in the development process. It is a big step forward for them – and it would be great if it also inspires other organisations to develop new and interesting approaches to measuring results and undertake rigorous analysis of what works.  Clearly this needs to be done in a way which each organisation can afford and resource – things need to be done in a proportionate way – but the Oxfam initiative shows some of what is possible.

They have chosen quite a practical strategy – picking out a random sample and then probing more deeply and using different techniquescartoon-evaluation_culture to measure impact or use well-tried monitoring of performance indicators.

Of course there is one potential drawback – random sampling may mean there are gaps in what you can say, if key areas don’t happen to have been sampled this time.  Oxfam also notes that the reviews do not necessarily enable full understanding of why a programme is successful (e.g. in Pakistan) and that they now need to go back and undertake some more work.   One way round this is more purposive sampling – we don’t know if this was considered –  or identifying priority themes up front based on what the organisational objectives are, and focusing on them in some depth.  The key challenge is finding a strategy for using the limited resources for evaluation and data collection in a targeted way that gives a nice balance between extensive coverage and intensive analysis.

Another challenge is maintaining the independence and integrity of those carrying out the evaluations.   Finding impartial observers – given that many people and experts have worked for years in these areas and know each other well – can be difficult.

The very interesting study of policy influencing by Oxfam’s partner in Bolivia, Fundacion Jubileo, is worth looking at in some detail.   It made a good case that the grantee was really having an impact on some key aspects of social change in Bolivia The evaluators clearly applied the process tracing technique skilfully and identified the most significant changes – but it must have been difficult to stay objective when doing the interviews, working with the grantee and identifying who was really influencing whom.  Howard White and Daniel Phillips’s paper on ‘small n’ techniques talks a lot about the biases that one needs to avoid in using these techniques.  The appendix to the study provides an excellent and useful set of reflections on the use of the process tracing methodology and what the evaluators learned.

One key assumption is that by doing more work and collecting more data (e.g. from comparison sites in Zambia and the Philippines), they will be able to understand and demonstrate impact.   Actually, based on discussions we have had with Michael Woolcock recently in DFID, we have started to ask a different sort of question.    In some types of programmes, more data and more work may not be the solution Gandhi v logframe cartoon- more innovative methods and approaches to understanding impact can be required and if the programme itself develops as you implement it then the goal posts are continually shifting too.

Looking ahead, and thinking about the next stages of this agenda….first, we would encourage others to share their approaches and experiences in the way that Oxfam has done.   Second, it would be great to see Oxfam and other NGOs sharing resources to develop better methods across the sector, given their common challenge of demonstrating results and the limited resources.   The results agenda is particularly challenging for smaller organisations, whose inputs are increasingly recognised – so can we ask if Oxfam sees itself as in a position to demonstrate leadership in linking with such organisation to jointly share and explore results?

Nick York is DFID’s Chief Professional Officer – Evaluation and Caroline Hoy, its Results and Evaluation Specialist, Civil Society Department.

October 24th, 2012 | 2 Comments

When we (rigorously) measure effectiveness, what do we find? Initial results from an Oxfam experiment.

Guest post from ace evaluator Dr Karl Hughes (right, in the field. Literally.)Karl Hughes 3

Just over a year ago now, I wrote a blog featured on FP2P – Can we demonstrate effectiveness without bankrupting our NGO and/or becoming a randomista? – about Oxfam’s attempt to up its game in understanding and demonstrating its effectiveness.  Here, I outlined our ambitious plan of ‘randomly selecting and then evaluating, using relatively rigorous methods by NGO standards, 40-ish mature interventions in various thematic areas’.  We have dubbed these ‘effectiveness reviews’.  Given that most NGOs are currently grappling with how to credibly demonstrate their effectiveness, our ‘global experiment’ has grabbed the attention of some eminent bloggers (see William Savedoff’s post for a recent example).  Now I’m back with an update.

The first thing to say is that the effectiveness reviews are now up on the web.  Here you will find introductory material, a summary of the results for 2011/12, and some glossy (and hopefully easy to read) two-page summaries of each effectiveness review, as well the full reports. (You may not want to download and print off the full technical reports for the quantitative effectiveness reviews unless you know what a p-value is. With the statistically challenged in mind, we have kindly created summary reports for these reviews, complete with traffic lights….).  Eventually, all the effectiveness reviews we carry out/commission will be available from this site, unless there are good reasons why they cannot be publicly shared, e.g. security issues.

Plug over, I can now give you the inside scoop.  In the first year (2011/12) we aimed to do 30 effectiveness reviews, and we managed to pull off 26. Not bad, but our experience in the first year made us realise that our post-first-year target of 40-ish reviews per year was perhaps a bit overly ambitious.  We have now scaled down our ambitions to 30-ish, to both avoid overburdening the organisation and enable better quality control.

The issue of quality control, in particular, is critical because there are certainly opportunities to strengthen the effectiveness reviews, particularly in terms of rigour.  Currently, there is considerable interest in how to evaluate the impact of interventions that don’t lend themselves to statistical approaches, such as those that are seeking to bring about policy change (aka “small n” interventions).  See a recent paper by Howard White and Daniel Phillips.  We have attempted to address this by developing an evaluation protocol based on a methodology called process tracing used by some case study researchers.  However, we are struggling to ensure consistent application of this protocol.  Time and budgetary constraints, as well as inaccessibility of certain data sources, are – no doubt – key militating factors.  Nevertheless, we aim to improve things this year by more tightly overseeing the researchers’ work, coupled with the provision of more detailed guidelines and templates so they better understand what is expected.

While in no way perfect, we have perhaps had more success with the reviews of our “large-n” interventions, i.e. those targeting large numbers of people.  This is, at least in part, because we are directly involved in setting up the data collection exercises, and we carry out the data analysis in-house.  The key to their success is capturing quality data on plausible comparison populations and key factors that influence programme participation, and this has worked out better in some cases than in others.  We are also attempting to measure things that just aren’t easy to measure, e.g. women’s empowerment and ‘resilience’.  We are modifying our approaches and seeking to collaborate with academia to get better at this.  Despite their shortfalls, at £10,000-ish a pop (excluding staff time), we believe these exercises deliver pretty good value for money.

Humanitarian programming is not my thing, but I am particularly pleased with the humanitarian effectiveness reviews that critically look at adherence to recognised quality standards.  While there are some methodological tweaks needed here and there, the cohort of reviews presents an impartial and critical assessment of Oxfam’s performance and identifies key areas that need to be strengthened, e.g. gender mainstreaming.

So what do the effectiveness reviews reveal about Oxfam’s effectiveness?  While the sample of projects is too small to draw any firm conclusions, the results for this particular cohort of projects are – as one might expect – mixed. For most projects, there is evidence of impact for some measures but none for others.

LA 134510.jpgThere are, no question, some clear success stories, such as a disaster risk reduction (DRR) project in Pakistan’s Punjab Province.  Here, the intervention group reported receiving, on average, about 48 hours of advanced warning of the devastating floods that hit Pakistan in the late summer of 2010, as compared with only 24 hours for the comparison group.  Having had more time to prepare is one possible explanation why the intervention households reported losing significantly less livestock and other productive assets.  Oxfam’s research team is in the process of commissioning some qualitative research to drill down on this project to better understand what made it work.

Given Oxfam’s size and capacity to mobilise and make noise, it is no surprise that there is reasonably reliable evidence that many of the campaign projects have brought about at least some positive and meaningful changes, despite falling short of fully realising their lofty aims.  However, the results for several of the sampled livelihoods and adaptation and risk reduction projects are, quite frankly, disappointing.  Figuring out why these particular projects have not worked is just as critical for learning as is figuring why the Pakistan one did.

Whether their findings are positive or negative, I have to admit that I am impressed with how seriously the effectiveness reviews are being taken by senior management.  A management response system has been set up and embedded into the management line, where country teams formally commit themselves to taking action on the results.

That being said, the effectiveness reviews are in no way immune from internal controversy.  The random nature of project selection is perhaps the biggest sticking point.  While we do this to avoid ‘cherry picking’, inevitably some of the projects that are selected are small-scale and have little strategic relevance to the countries and regions.  Some are also concerned about how much time and resources the effectiveness reviews are sucking up.

We know that what we are attempting to pull off can be improved on a number of fronts, in terms of rigour, learning, and engagement and ownership of country teams.  And the good thing is that we are able to modify and improve things as we go along.  So any constructive criticism, advice, etc. is most welcome.

October 10th, 2012 | 19 Comments

Can we demonstrate effectiveness without bankrupting our NGO and/or becoming a randomista?

Back in March there was a fascinating exchange on this blog between Ros Eyben and Claire Melamed on the role of measurement in development work (my commentary on that debate here). Now one of Oxfam’s brightest bean counters (aka ‘Programme Effectiveness Adviser’), Karl Hughes, explains where Oxfam has got to on this:

Eric Roetman, in a recent 3ie working paper, A can of worms? Implications of rigorous impact evaluations for development agencies, tells a provocative tale of the experiences karlof International Child Support (ICS) in Kenya carrying out randomised control trials (RCTs) in partnership with several  world-renowned quantitative impact evaluation specialists.  ICS saw itself evolve into a “development lab”, where the bulk of its staff became devoted to supporting the organisation’s research, as opposed to development, operations.  Given ICS’s desire to revert back to its roots, it eventually opted to get out of the RCT business.

ICS’ story relates directly to issues further explored in another recent 3ie working paper I recently co-authored with Claire Hutchings, another one of Oxfam GB’s global MEL advisers, entitled Can we obtain the required rigour without randomisation?  Oxfam GB’s non-experimental Global Performance Framework.  The central issue is this: We in the international NGO community are all too aware of our need to up our game in both understanding and demonstrating the impact – or lack thereof – of the various things we do.  But what really baffles us is just how to do so without going down the “development lab” route.  (This is not to imply that “development labs” are bad; in fact, the more their findings inform our programming, the better.)

The bottom line, as outlined in our paper, is that evaluation is research, and, like all credible research, it takes time, resources, and expertise to do well.  This is equally true no matter what our epistemological perspective – positivist, realist, constructionist, etc.  This is perhaps why, rather than using  those offered by mainstream academia, we as a sector are so quick to experiment with seemingly more doable alternatives such as Most Significant Change, social return on investment (SRI), outcome mapping, and participatory M&E.  They’re all very well, but those of us who feel a need to go further find ourselves at a loss.

One popular way of attempting to demonstrate effectiveness, being pursued by several international NGOs, which we comprehensively bash in the paper, is dubbed “global outcome indicator tracking.”  Here, the organisation in question gets all its programmes/partners  to collect common data on particular outcome measures, e.g. household income.  All these data are then aggregated (only the gods know how) to track the welfare of global cohorts of programme “beneficiaries” over time.  If there is positive change in relation to the indicator from time 1 to time 2, the organisation can boast about how much impact it is generating.  Aggregation complexities aside, the underlying foundations of this approach are inherently precarious.  In general, outcome level change is influenced by numerous extraneous factors, e.g. rainfall patterns in rain-fed agricultural communities.  Consequently, even if we are able to capture reliable data on a decent outcome indicator, its status will go up and down and all around not matter what our interventions are and/or how well they are implemented.  Any consideration of attribution is entirely absent.

But what of the fact that donors have been encouraging us to pursue outcome indicator tracking for decades now through instruments such as the logframe, as part of ‘good practice’?  In a paper entitled, The Road to Nowhere, Howard White argues that the United States Agency for International Development (USAID) identified the futility of the outcome indicator tracking strategy some years ago and, consequently, abandoned it.  I worked on a USAID funded orphan and vulnerable children (OVC) programme from 2005 to 2010, and yes we were only required to report on outputs, so perhaps this was the consequence of this realisation.  (Incidentally, USAID also came bean counterto the realisation that there was no evidence-base established on what works and what does not in OVC programming after all the billions that it spent and seems to regret not having supported the rigorous evaluation of key OVC care and support interventions.)  To what extent have the other donor agencies recognised the fallibility of outcome indicator tracking?  Sadly, there is plenty of evidence to suggest that many are still operating in this outdated paradigm.

So where does this leave us as NGOs?  While Oxfam GB has not come up with a panacea, it is attempting to pursue a strategy that is reasonably credible.  Each year, we are randomly selecting and then evaluating, using relatively rigorous methods by NGO standards, 40-ish mature interventions in various thematic areas.  The causal inference strategy differs depending on the nature of the intervention.  For community-based interventions, for instance, where we are targeting many people (aka large n interventions), we are attempting to mimic what RCTs do by statistically controlling for measured differences between intervention and comparison populations.  Evaluating our policy influencing and “citizen voice” work (aka small n interventions), on the other hand, requires a different approach.  Here, a qualitative research method known as process-tracing is being used to explore the extent to which there is evidence that can link the intervention in question to any observed outcome-level change.

It is not that the above approaches are free of  limitations.  In the case of large n interventions, for instance, given that programme participants have not been randomly assigned to intervention groups, coupled with the conspicuous absence of proper baseline data, we cannot absolutely guarantee that any observed outcome differences are the result of the workings of the intervention in question.  The  process tracing approach is also retrospective in nature, when ideally the research should take place throughout the life of the advocacy or popular mobilisation initiative.  But, hey, what we are doing is not too shabby, especially considering that we are no “development lab.”  Moreover, every evaluation design –  even the golden RCT – has  inherent limitations.  Nonetheless, if anyone has any suggestions on how NGOs in general and Oxfam in particular can do a better job at both understanding and demonstrating impact, I’d love to hear them.

September 9th, 2011 | 17 Comments

So where have we got to on Value for Money, Results etc?

Great posts, great comments. My head is now spinning as I try and disentangle some of the different threads that havecomplexity sign emerged over the last two days.

First: horses for courses. Some aid work is akin to Ros’ bathroom problem – linear, measurable, and suitable for a logframe + results approach. Other areas are emergent and unpredictable and a results approach would struggle. Say you had a programme in Egypt right now, and were wondering what to spend your money on. You could reassure your donors and supporters by opting for a measurable bathroom problem, say building schools, but that would be to ignore the historic opportunities for change presented by the social and political upheaval in that country. But how could you support that with any likelihood of proving impact or attribution? Tricky, but a clear risk that the results agenda will drive you in the wrong direction. Could senior management, as Jonathan suggests, create a situation where some programmes are assessed on results and others on relationships? In the current climate, it’s easy to imagine that the latter category would end up being starved of funds.

Second: upwards v downwards accountability. Can a results agenda strengthen both -  can countability improve accountability? (thanks for the soundbite, Sceptical Secondo). Claire, supported by Penny Lawrence and Alex Jacobs (with an excellent link to some practical examples) thinks it can.

Third: theory v practice. Claire’s right, I think, that in theory, a results agenda can be built on the perceptions of beneficiaries, improving quality and accountability. But Ros has spent a lot of time looking at what happens once all these ideas are implemented on the ground, and what looks good in the thinktank (and apparently in the NHS) may not survive the collision with reality, where staff are overstretched, working to tight deadlines and have little time for innovation or risk-taking. When I talked to Oxfam’s number crunchers about this exchange, they said they would love to take part, but were simply too busy generating the numbers needed to satisfy our donors!

Fourth: Trust. Ros rightly raises this, but trust between whom? The results agenda aims to build trust between northern publics and aid agencies, which is of course vital if aid spending is to continue to rise. And given that NGOs endlessly tell corporates and governments that we have moved from a ‘trust me to a show me agenda’, it would be pretty hypocritical to say the same shouldn’t apply to us. But what about the trust between aid workers on the ground and the partners they work with? Ros worries that that trust will be eroded by a crude results focus (and Alice Evans’ example from Zambia suggests she’s right), whereas Claire seems more concerned about using measurement to tap directly into the lived experience of poor people (hard, if that group is not as easily identifiable as NHS patients – back to my Egypt example).

Fifth: It all comes back to people, in particular the skills and motivations of the people who work for bilaterals, NGOs and all the other bits of the aid industry. If you have brilliant, motivated staff dripping with a sense of vocation, then they can probably make either a results agenda or a relationship-based approach work just fine. If you have demotivated nine-to-fivers who see this as just another job, then they will find a way to tick boxes and achieve little, whatever approach is adopted. I guess the interesting question is about those in the middle – what works best with normal staff doing the best they can, while coping with all the other pressures in their lives?

Which brings me to my final conclusion. I assume that the value for money people would never dream of asking us to take their ideas on trust. What are the results of the results-based agenda, compared to other approaches? What would be the best way to evaluate the evaluators? Is the balance of evidence different for say, work on women’s empowerment, governance, livelihoods or health and education? Lots of work for researchers over the next few years. [update: Ben Ramalingam came to much the same conclusion a few months ago on the Aid on the Edge of Chaos blog]

So thanks everyone, I’m now better informed, but still on the fence. As are the rest of you, judging by the pretty even split on the poll. It stays up til Monday, so not too late to vote…..

Update: in a similar vein, someone just put this up on twitter [h/t Henry Northover and Ian Thorpe]:

“Can I pay for Nancy Birdsall’s new book on Cash on Delivery aid after I’ve tried it out to see if it really works?”

March 17th, 2011 | 3 Comments

‘Stuff happens’: the risks of a results agenda. Guest post from Rosalind Eyben

A few months ago, I blogged about the risks associated with the aid industry’s current overriding obsession with audit/value for money/results (pick your term). Since then, that debate has been swirling around both on this blog Ros Eyben portraitand (more importantly), in aid and development circles in many countries. So to help it along a bit I’ve asked two people who think about this a lot more than I do to set out some competing arguments. First up is Ros Eyben, who got a big and largely positive response to her recent challenge to the dumber/more extreme varieties of value-for-moneyism. Tomorrow the ODI’s Claire Melamed responds. Please join in the debate.

“The UK’s development ministry (DFID) has just completed a review of its bilateral aid programme. The Secretary of State for International Development, Andrew Mitchell has ‘set out the results that UK aid will deliver for the world’s poorest people over the next four years’. DFID will be more ‘hard-headed about making every penny count’. Its press release highlights results such as 11 million more children at school and 50,000 fewer women not dying from having babies. Digging into the review’s report, you will find numbers relating to DFID’s other aims, including wealth creation and tackling the root causes of conflict. Here, DFID is more modest: 50 million people with the means to help work their way out of poverty, rather than creating millions more jobs as some enthusiastic DFID country offices apparently offered to achieve. How can a government (let alone a foreign aid agency) deliver jobs? Likewise, DFID is not going to reduce the number of conflicts in the world but instead help citizens hold their governments more accountable.

When we look at the details, DFID’s plans seem pretty sensible. But the press release worries me. Explaining to the British public how UK aid delivers value for money – promising to educate more children than those we educate in the UK, but at 2.5% of the cost – must surely influence how DFID thinks and works. I am in charge of redecorating our bathroom while my partner is away. The paint is peeling and there is mildew on the ceiling above the shower. To demonstrate I got value for our money I will get two quotations for the redecoration. Many donor governments are treating the complex problems of poverty like my bathroom. They contract a Third Party Operator to deliver a result pre-determined by DFID. At the end of three or four years, there is an evaluation to check on results before paying the contractor.

Sometimes DFID’s bounded problem-approach to change (as typified by the logical framework) is going to work. But there are major concerns about the institutional and financial sustainability once the intervention ends, if these have not been addressed as an integral part of the design. By 2006 the global polio vaccination campaign had successfully eradicated polio from all but four countries, yet by 2008 it had reappeared in nineteen additional countries. In the drive for results, insufficient attention had been paid to the national health systems needed to keep polio at bay.

To be able to count exactly how each penny or Euro of aid money gets spent, donor governments are risking not making any difference at all. They can show how many kilometres of roads they have built or numbers of babies vaccinated as compared with before they started the projects. But such facts reveal little about how the change was achieved and what can be learnt for future policy and practice. End-of-project evaluations are no substitute for continuous learning and adaptation of approach. Donors are ignoring lessons long since learnt: without local people empowering themselves to change those less tangible factors that cannot be counted, once donor money stops the roads will crumble away and the next generation of babies will not be vaccinated. These inadequate measures of assessment – and the effect of such measures on the design of aid – risks donor governments wasting, instead of securing ‘value for money’.

dilbert auditing

Eventual outcomes are often very different from what the logical framework required. Stuff happens. Power, history and culture shape the multiplicity of relationships and actors influencing any aid intervention. It makes more sense to design aid to recognize this. Experienced staff and consultants know it. But they are being forced to misrepresent reality in order to keep things simple for the taxpayer. They have to work with complex problems – such as why maternal mortality rates refuse to go down – as if they were bounded problems like my mildewed bathroom. In a largely unpredictable and dynamic environment, rather than choosing a single ‘best option’, a more value-for-money might be achieved by financing two or more different approaches to solving a complex problem, facilitating variously-positioned actors to implement an intervention according to their different theories of change and diagnoses and consequent purposes.

Aid bureaucracies have never recognised that effective aid depends on people and the quality of their relationships with each other. Sheela Patel of SPARC, an Indian NGO that supports slum dwellers federations has written that when SPARC was founded in 1984

‘ Donors gave money to us because there was a sense of trust. These funders did not set our priorities; communities of poor people did….. we were given all the space we needed. Consequently, SPARC and its partners now operate in nine states of India and help some 750,000 households….. I cannot imagine donors in today’s world granting an organization like SPARC the kind of latitude it required in its early years. Instead,[they] have become more focused on developing portfolios of projects, managing risks, and producing outcomes rather than on listening to communities, healing deep inequities, and supporting innovation’.

The origins of the results agenda lies in a mistrust that eats like a cancer into aid agencies’ capacity to make a difference. I am not convinced the emphasis on results will solve the problem of trust. On the contrary, it risks making things worse. The results rhetoric gets exaggerated by bureaucratic systems and by those middle level managers with little country level experience who are forcing grantees and development partners into straitjackets that constrain them from helping transform the lives of people in poverty.

We aid practitioners must start building trust. Steps in the right direction include paying attention to the inequitable power relations, including our own behaviour, which keep people in poverty; being modest about what any purposeful intervention can achieve; and communicating simply with taxpayers about complex realities.

Rosalind Eyben is a Fellow at the Institute of Development Studies and former Chief Social Development Adviser at DFID.

March 15th, 2011 | 8 Comments

Powered by WordPress | Design modified by Eddy Lambert from the Blue Weed theme by Blog Oh! Blog | Entries (RSS) and Comments (RSS).