Aid and complex systems cont’d: timelines, incubation periods and results

I’m at one of those moments where all conversations seem to link to each other, I see complex systems everywhere, and I’m wondering whether I’mtyranny is the absence of complexitystarting to lose my marbles. Happily, lots of other people seem to be suffering from the same condition, and a bunch of us met up earlier this week with Matt Andrews, who was in the UK to promote his fab new book Limits to Institutional Reform in Development (I  rave reviewed it here). The conversation was held under Chatham House rules, so no names, no institutions etc.

Whether you work on complex systems or governance reform or fragile states, the emerging common ground seems to be around what not to do and to a lesser extent, the ‘so whats’. What can outsiders do to contribute to change in complex, unpredictable situations where, whether due to domestic opposition or sheer irrelevance to actual context, imported blueprints and ‘best practice guidelines’ are unlikely to get anywhere?

In his book Matt boils down his considerable experience at the World Bank and Harvard into a proposal for ‘PDIA’ – Problem Driven iterative adaptation, which I described pretty fully in my review. The conversation this week fleshed out that approach and added some interesting new angles.

PDIA needs funding, but not big million dollar cheques that come with all the paraphernalia of targets, milestones, logframes etc that are more likely to kill thought than promote experimentation and learning. Instead, it needs a trust fund approach – lots of small grants that allow incubation of local solutions to a given problem while ‘avoiding a premature results agenda’.

But does that mean that institutional reform should avoid the big aid dollars altogether? Matt thought not – he portrayed PDIA as a new and extended incubation phase, which can then take the homegrown solutions that emerge and move into the more traditional aid world of large scale, large budget programming. So the challenge for aid agencies is how to create, fund and protect a space within their institutions for small budget experimentation and incubation, sitting in parallel with the big stuff.

timelineTimelines emerged as a useful, but undervalued tool. But these are timelines of what has actually happened in the past, not the imaginary future timelines of funding applications. Matt reckons any project seeking funding should start by building a 20 year timeline of what has happened on that issue/in that locality. If done properly, the exercise of reconstructing the timeline using documents and interviews will reveal overlapping interpretations of what actually happened and recover the kinds of knowledge and experiences that all too often go missing in Aid World as staff leave and projects are wound up. We need a decent timeline methodology – Matt uses the work of Peter Hall at Harvard but it also sounds a lot like process tracing, something our MEL team uses.

The issue of narratives is central – it lies at the heart of the response to a reductionist results agenda that privileges pseudo medical trial data over real experience. Claire Melamed likes to say ‘the plural of anecdote is not data’. True, but I think that a well researched anecdote rapidly becomes a ‘narrative’, and the plural of narrative can definitely be evidence, if not data. Matt, ODI and Oxfam are all separately thinking about the need to build a collection of rigorous, nuanced narratives on stories of power and change – we’ll be swapping notes and hopefully coming up with some ideas for working together on this. What would people recommend in terms of references on rigorous narrative methodologies?

There was a good discussion on what constitutes ‘results’. Good PDIA-type work in developing countries requires a rapid feedback loop of results, but of a different kind to those typically demanded by the aid business. Developing country politicians want to know what’s happening with their money, what has been learned, what has worked and what hasn’t, and how the project has responded. They don’t need the (often bogus) certainty and data demanded by aid planners.

I do find this all slightly baffling – politicians intrinsically know how to navigate in complex environments, respond to shocks and opportunities, using trial and error, instinct and rules of thumb. They make decision on partial information and change direction if things don’t work. That’s what politics is about. But then they become aid ministers in donor countries, and suddenly buy into a paraphernalia of logframes and a particular understanding ofcomplexity signresults that in some other part of their brains they must know has huge limitations in the real world. How to get ministers to think more like pols and less like aid bureaucrats?

All fascinating and thanks to Matt for kicking off and CGD Europe for organizing the discussion (am I allowed to say that under Chatham House rules? If not, please ignore). I’m thinking of writing a paper on the ‘so whats’ of complex systems, but will first wade through the draft of Ben Ramalingam’s forthcoming book before deciding whether it’s necessary.

Update: more thoughts from Matt Andrews on his blog

May 22nd, 2013 | 13 Comments

So What do I take Away from The Great Evidence Debate? Final thoughts (for now)

evidenceThe trouble with hosting a massive argument, as this blog recently did on the results agenda (the most-read debate ever on this blog) is that I then have to make sense of it all, if only for my own peace of mind. So I’ve spent a happy few hours digesting 10 pages of original posts and 20 pages of top quality comments (I couldn’t face adding the twitter traffic).

(For those of you that missed the wonk-war, we had an initial critique of the results agenda from Chris Roche and Rosalind Eyben, a take-no-prisoners response from Chris Whitty and Stefan Dercon, then a final salvo from Roche and Eyben + lots of comments and an online poll. Epic.)

On the debate itself, I had a strong sense that it was unhelpfully entrenched throughout – the two sides were largely talking past each other,  accusing each other of ‘straw manism’ (with some justification) and lobbing in the odd cheap shot (my favourite, from Chris and Stefan ‘Please complete the sentence ‘More biased research is better because…’ – debaters take note). Commenter Marcus Jenal summed it up perfectly:

‘The points of critique focus on the partly absurd effects of the current way the results agenda is implemented, while the proponents run a basic argument to whether we want to see if our interventions are effective or not. I really think the discussion should be much less around whether we want to see results (of course we do) and much more around how we can obtain these results without the adverse effects.’

There were some interesting convergences though, particularly Whitty and Dercon’s  striking acknowledgement of the importance of power and politics, which are often assumed to be excluded from the results agenda. But what they actually said was

‘Understanding power and politics and how to assist in social change also require careful and rigorous evidence.’

True, but what about reversing the equation? Does understanding the role of evidence in development also require a careful and rigorous understanding of power and politics? They never fully address that crucial point, which is at the heart of Roche and Eyben’s critique.

correlation v causation cartoonBoth sides (rather oddly, as acknowledged experts in their fields) decried the role of experts. Whitty and Dercon called for ‘moving from expert (i.e. opinion-based, seniority-based and anecdote-based) to evidence-based policy’. Ah, turns out that what is actually being suggested is a move from one kind of expert (practitioners) to another (evidence/evaluation).

As a non number-cruncher I also took exception to their apparent belief that only those who understand the methodological intricacies of different evaluation techniques are eligible to pass judgement. On that basis politicians would be out of a job, and only rocket scientists would get to pronounce on Trident.

There was also a really confusing exchange on the hierarchy of evidence. Whitty and Dercon show a surprising (to me at least) commitment to multi-disciplinarity: ‘Methods from all disciplines, qualitative and quantitative, are needed, with the mix depending on the context….. it is not a matter of just RCTs, but of rigour, and of combining appropriate methods, including more qualitative and political economy analysis.’

Music to the ears of the critics, but is it actually, you know, true? Everything I hear from evaluation bods is that DFID does actually see RCTs as the gold standard, and other forms of evidence as inferior. Roche and Eyben returned to the attack on this in their response, arguing that what Whitty and Dercon call the ‘evidence-barren areas in development’ are only barren if you discount sociology and anthropology, among others, as credible sources of evidence. By the way, Ed Carr has a brilliant new post on the (closely linked) clash between quants and quals, arguing that while quants can establish causation, only quals can explain how that causation occurs.

But the exchange did provide me with one important (I think) lightbulb moment. It was about failure. Whitty and Dercon were particularly convincing on this: the evidence agenda ‘involves stopping doing things which the expert consensus agreed should work, but which when tested do not’. This is a nice Popperian twist – the role of evidence is not to prove that things work, but to prove they don’t, forcing us to challenge received wisdom and standard approaches. This is indeed what I noticed about Oxfam’s recent ‘effectiveness reviews’ – if you find no or negative impact, then you (rightly) start to re-examine all your assumptions. But if this is the proper role for the evidence agenda, is it politically possible? By coincidence I have just read Ed Carr’s forceful critique of Bill Gates’ approach to evaluation, arguing that failure is often airbrushed out in order to safeguard funding and credibility. That seems a pretty fundamental contradiction.

The comments were just as thought-provoking. One of the key messages that emerged is the gulf between these debates and what those in complexity signcharge of gathering results in aid agencies actually face – highly constrained resources, crazy time pressure, and the need to deliver some (any!) results to feed the MEL machine. Oxfam’s Jennie Richmond reflected on the gap between theory and practice yesterday.

Commenter Enrique Mendizabal asked whether we are demanding a different role for evidence in poor countries than in our own.

‘In the UK, health policy is decided by a great many number of factors or appeals (evidence, sure, but also values, tradition, biases, political calculations, etc). We may complain about it but we accept that it is a system that works. But health policy for Malawi (or other heavily Aid dependent countries) is decided mainly by evidence (or what often passes as evidence at the time) and usually by foreign experts…. would we be happy with USAID funding a large evidence-based campaign to reform the NHS or our education policy?’

But he took his argument a step further – if the final decision should be left to the interplay of evidence (of different sorts), politics and negotiation, then DFID and other donors would be better advised to boost the ‘enabling environment’ for such debates and decisions by investing in tertiary education in developing countries:

‘strengthening economic policy debate is a more adequate objective than achieving policy change (even if it is evidence based).’

Commenter David highlighted a fundamental point that rather went missing in the initial exchange – how the results agenda does or doesn’t work in complex systems:

‘The results agenda approach tends, by presenting development as objectively knowable if broken down into discrete and small bits, todrive attention toward small, more easily measurable interventions to test, particular those that are suited to situations that are simple or complicated rather than complex. Current processes around evidence-based results fail to grapple with complex systems, interaction effects, and emergent properties that dominate most aid project landscapes.

A fundamental critique of the evidence-based revolution is that it actually diminishes efforts to get rigorous evidence about addressing complex challenges. We all want evidence, it’s a question of whether the current framing of “evidence-based” is distorting what types of evidence we gather and value. For those who think that the current emphases on methods to test what works are distorting how we value the evidence coming in (RCT=gold, qualitative methods=junk), this offers little other than platitudes about lots of other methods existing.

Personally, I would be a bigger proponent of the evidence-based revolution if it was coming to folks interested in power, politics, and development, and asking them what their questions are and what evidence might contribute to their work. Absent a learning agenda set to fit complex space and concern itself with power, it will continue to seem to me to be an instance of methods leading research – or searching for keys under the light rather than inventing a flashlight.’

To be fair, Roche and Eyben explicitly chose to focus on the politics of evidence, rather than the implications of complex systems (for example, the question of external validity in complex systems – or lack of it – raised by Lant Pritchett in our recent conversation.)

Final thoughts? After about 500 votes, the poll went narrowly to Whitty and Dercon (34% v 31% for Roche and Eyben, with a pleasing late rally for the ‘totally confused’ camp – my natural habitat). I think Chris Roche and Rosalind Eyben need to work on their communication style (more punchy, less abstract, more propositional). Chris Whitty and Stefan Dercon should give some examples of gold standard anthropological or sociological evidence to allay the doubts over their true commitment to multi-disciplinarity, and take the complex systems question more seriously.

A massive thankyou to all who took part, and please can you come back for another go in a year or so? This one isn’t going away.

February 7th, 2013 | 12 Comments

Theory’s fine, but what about practice? Oxfam’s MEL chief on the evidence agenda

Two Oxfam responses to the evidence debate. First Jennie Richmond, (right) our results czarina (aka Head of Programme PerformanceJennieRichmond and Accountability) wonders what it all means in for the daily grind of NGO MEL (monitoring, evaluation and learning). Tomorrow I attempt to wrap up.

The results wonkwar of last week was compelling intellectual ping-pong. The bloggers were heavy-hitters and the quality of the comments provided lots of food for thought. However, I was left wondering what it all meant for those of us who work in NGOs, trying to generate and learn from ‘evidence’ on a daily basis. I found myself unable to simply vote, so instead I blog….

The results and evidence agendas have brought some real benefits to NGOs in my view. First and foremost, it is important and right that those of us who claim to work in the interests of the poorest people in the world and are stewards of other people’s money, should set ourselves high standards for our own impact. In its simplest form the results agenda asks us to justify the trust others have placed in us, by demonstrating whether we are actually bringing about positive change. In Oxfam GB, accountability has long been held as a core organisational value. It is not the results and agenda that has got us thinking about how to capture and communicate our effectiveness, but it has provided a helpful additional push.

A further positive is that space has been created both within our own organisations and in the wider sector, to stop, listen and learn. MEL-istas (as Duncan calls us) 5 years ago struggled to get the ear of senior managers (let alone Ministers). But the results agenda has increased the stakes around MEL – encouraging organisations not only to increase investment, but also to listen to the findings coming from our own data gathering and analysis.

However, it has also increased the demand and the expectation, which are not easily met by all NGOs. In Oxfam GB the investment in MEL has increased over the last couple of years, undoubtedly, but still it is a real stretch to deliver the ever-more ambitious demands from donors, to develop tools to tell the story of our broader organisational impact, and to ensure that we are developing innovative ways of measuring cutting-edge programming areas, such as resilience, enterprise development and influencing.

And we are one of the largest international development NGOs in the UK. How much more difficult for the smaller and niche NGOs, or those who lack the flexible financing that permits investment in MEL and innovation? We are conscious in Oxfam that we and other large NGOs need to guard against distorting the NGO market place by pushing the boundaries on MEL and impact too far, and thereby creating expectations that cannot be met by everyone. Somehow we all need to keep our sights on a proportionate approach.

cartoon-evaluation_cultureIt is not just important to generate evidence, but also to use it properly. There is increased demand for serious, evidence-based conversations about what works.  None of us can get away with decisions made purely on gut instinct, force of habit or ideological leaning. We are challenged by the ‘evidence’ question to collate and distil from the broad knowledge base we have at our disposal. And this has in some cases led to surprises. Rigorous studies, whether based on qualitative or quantitative methods, can challenge our preconceptions – showing us impact where we were not optimistic, or the opposite. The test, of course, comes when new programmes are designed. Will the body of evidence be applied – will we be able to find it for starters (in our often not-so-state-of-the-art knowledge management systems), and will it be politically acceptable in our own organisations to apply it to practice?

So, how can we use the results and evidence agendas and make them useful to us as NGOs?  We need to do this in a way that a) is true to the actual work we do (which in the case of Oxfam includes a great deal of work that drives for political change and influencing) and b) does not distort decision-making away from the right decisions (i.e. what most suits the specific needs and opportunities of each context) in our efforts to be able to measure and communicate what we are doing.

One of the concerns raised in last week’s blog was that in some institutions, evidence becomes synonymous with impact evaluations, and even specifically with Randomised Control Trials. As all the bloggers agreed, the default use of one research method for interventions of all types is simply nonsensical. You only have to look at the enormous variety of the things we do in international development (from campaigning for policy change to delivery of bed-nets, from building of bridges to raising awareness of the rights of citizens) to realise that one approach is just not going to cut it.

Another challenge is that so much of what we do in international development is extremely hard to measure. How can we trace the input through to impact chain and clearly demonstrate the ‘on the ground’ changes we have brought about in people’s lives when the investment is in budget support or core funding?  How can we reduce the process of a community standing up against acts of violence against women to a Value for Money calculation? The ethical dilemmas and practical difficulties wrapped up in measuring and ‘evidencing’ many of the processes we are involved in are huge. And, as Eyben and Roche point out, much of what we engage with in international development is messy and political. We need to make sure that the tools we have at our disposal for evidence generation are sophisticated and nuanced enough to acknowledge this messy political reality, and that we are sharing ideas on how to do this in a practical and affordable way.

The push for evidence should go hand in hand with a more entrepreneurial approach to development, opening up space for honest

MEL that - US military mindmap of Afghanistan

MEL that - US military mindmap of Afghanistan

reflection on both success and failure. That is the theory. But, of course, there are obstacles to this becoming a reality. Our systems in large institutions, including NGOs, are designed to demonstrate success. We all have our logframes and our KPIs, and we want to be able to put a tick in the box. No-one wants their project to be the one famous for not achieving what it set out to do, even if the real story is that it helped enormously to generate learning for future projects. Complexity thinking is having some influence right now, which helps to raise the right questions about process and incentives. However, we have a long way to go before even in the most reflexive learners in NGOs and other development institutions want their project to be hailed as the great failure.

So, we proceed with caution – welcoming the increased space the Results Agenda provides to consider ‘what seems to work’, and the profile it gives to the need to take a thorough and transparent look at the information coming out of our programmes. But, wary of the dangers of distorting what we do in order to make it measurable; of placing the MEL ‘bar’ for NGOs too high to reach; of the over-emphasis of certain methodologies; and of the danger of ignoring political realities in the work that we do. It is certainly helpful to keep reflecting and questioning, however, from all sides of the debate – so the wonkwar of last week was welcome.

February 6th, 2013 | 4 Comments

Evidence and results wonkwar final salvo (for now): Eyben and Roche respond to Whitty and Dercon + your chance to vote

Chris RocheIn this final post (Chris Whitty and Stefan Dercon have opted not to write a second installment), Rosalind Eyben and

Ros Eyben portrait

Chris Roche reply to their critics. And now is your chance to vote (right) – but only if you’ve read all three posts, please. The comments on this have been brilliant, and I may well repost some next week, when I’ve had a chance to process.

Let’s start with what we seem to agree upon:

  • Unhappiness with ‘experts’ – or at least the kind that pat you patronizingly on the arm,
  • The importance of understanding context and politics,
  • Power and political institutions are generally biased against the poor,
  • We don’t know much about the ability of aid agencies to influence transformational change,
  • Mixed methods approaches to producing ‘evidence’ are important. And, importantly,
  • We are all often wrong!

We suggest the principal difference between us seems to concern our assumptions about: how different kinds of change happen; what we can know about change processes; if how and when evidence from one intervention can practically be taken and sensibly used in another; and how institutional and political contexts then determine how evidence is then used in practice. This set of assumptions has fundamental importance for international development practice.

Firstly, we understand social change to be emergent and messy. Organised efforts to direct change confront the impossibility of any of us ever having a total understanding of all the sets of societal relationships and contested meanings that generate change and are in constant flux. New inter-relational processes are constantly being generated that in turn affect and change those already in existence. Complexity theory privileges a concern for process as much as goals and supports an approach that seeks to make a difference by working through relationships rather than focusing on narrowly defined pre-set projects and outcomes. It encourages being explicit about values and a concern for how an organisation’s intervention is judged by others, in particular by those that are meant to ultimately benefit, and the creation of effective feedback mechanisms – including, but not limited to, those produced by high quality research.

evidenceAt their best, development practitioners often have to surf the unpredictable realities of national politics, spotting opportunities supporting interesting new initiatives, acting like entrepreneurs or searchers, rather than planners. They are keeping their eye on processes and looking to ride those waves that appear to be heading in the direction that matches their own agencies’ mission and values, and which can support local coalitions for change.  On the contrary, assuming that development practitioners are in control and that change is predictable – as expressed through some of the demands of evidence-based planning approaches – prevent them from responding effectively to feedback in an often unpredictable and dynamic policy environment, and can, if badly managed, chain them to a desk. Ben Ramalingam’s blog site – Aid on the Edge of Chaos – offers current insights on complexity thinking in development.

That it is relatively easier to eradicate rinderpest in cattle and build bridges than tackle police corruption or reduce violence against women is because the first are examples of what Dave Snowden describes as complicated problems and the latter are complex – an effect of there being so many collaborators involved in non-routine interventions with absence of consensus among them.  Such issues can’t be ‘solved’ like a Sudoku puzzle. In that respect, we were puzzled by Chris and Stefan’s two examples of what we would describe as complex issues. We found the first – the effect of political quotas for women in rural India – to be somewhat superficial and wondered why so little reference was made to the considerable number of studies from political sociology on the same topic that ask more probing questions and arguably provide more insightful understanding of what has been learnt in different contexts.  The World Bank study on whether top-down large scale interventions can stimulate bottom-up participation was on the other hand  puzzling for exposing myths that perhaps only World Bank staff had previously believed in, while ignoring the very considerable body of sociological and anthropological knowledge on this topic. It led us to wondering whether you need economists to find something out for it to be accepted as evidence.  Perhaps that explains some of ‘the evidence-barren areas in development’………

Which brings us to the second set of assumptions about how we know and therefore what is judged as evidence.  This is about more than pluralism and mixed methods, though we recognise that recent advances, in this case funded by DFID, are important.  Let’s start by insisting that a criterion for rigorous research is that it should be explicit about its assumptions or world-view. We suggest that a weakness in many studies is that they usually focus solely on the methodological and procedural and render invisible their ‘philosophical plumbing’. The evidence-based approaches that Stefan and Chris advocate are imposing a certain view of the world, just as our approaches do. Their claims to the contrary foreclose any possible discussion about the different intellectual traditions in interpreting reality.  Theory invites argument and debate.

An interesting paper by Greenhalg and Russell on evaluating health programmes notes how experimental approaches often ignore theevidence based change placardtricky philosophical and political questions. Like the authors of that article, we take an approach that recognizes the partial (in both senses of the word) nature of our knowledge. How does this approach try to deal with unavoidable bias?  Through seeking to use dialogic, democratic methods in which multiple perspectives and understandings of what is at stake are explored, and the use of multiple and hybrid approaches.  The implications for practice are to be involved in mutual single and double-loop learning and adaptation as you go along. This does not preclude specific studies commissioned from ‘experts’, but it is not they alone who should define the problem nor should they assume that only their kind of knowledge has validity for collective efforts to try to secure greater equity and social justice.  Knowledge and power are bed-mates.  Our critique of ‘expertise’ – the laboratory references are an extreme example of the trend – is that expertise often uses its power to ignore other ways of knowing and doing, something Chris and Stefan would seem to agree with. Might it be that some of these ways might prove to be pretty good at tackling police corruption or reducing violence against women?

This is where reflexivity comes in.  Those of us working as practitioners, bureaucrats and scholar activists in international development cannot escape the contradiction that we are strategizing for social transformation from a position in a global institution – international development – that can and does sustain inequitable power relations, as much as it succeeds in changing them. Reflexive practice seeks to address these power inequities by recognizing that (a) many problems we seek to address are the products of human interaction – and some very important problems for people with less voice go ignored for that reason, and  (b) even if people are in agreement about there being a problem, they will often offer multiple diagnoses for its existence, and thus of course (c) multiple solutions, which need to be debated democratically with different kinds of evidence, based on alternative ways of knowing, and having the space to be heard.

We are heartened to note that Chris and Stefan believe “that all actions by external actors will interact with political forces and vested interests” and that “in many of the settings where development actors want to make a difference, power and political institutions are biased against the poor”. We would therefore assume that a reflexive donor would recognise that their power and agenda need examination as much as anyone else’s.

Chris and Stefan suggest ‘the commitment to evidence has opened up the space fundamentally to challenge conventional, technical approaches to aid.’ We would agree, but it would seem that the exception to this is when it comes to addressing the power of donors such as DFID, being honest about the domestic political pressures they are under, and assessing the possibility that their behaviour (including how evidence-based approaches are managerialised) may on occasions be undermining processes of development and social transformation. Is DFID drawing upon anthropologists or ethnographic researchers, as the Police in the UK have recently done, to understand how its policies on, for example, results or value for money change behaviour in the agency, and its relationships with others?

To imply that we are suggesting that ‘it is not worth trying to provide the best and most rigorous evidence to those who need to make difficult decisions’ is simply a wilful mis-stating of our position. On the contrary we are arguing there is more ‘evidence’ out there than some seem to admit because their world view precludes seeing this as such. Where we in particular see the need for more evidence is about how the evidence-based and results agenda plays out in practice. How it affects the behaviour of development agencies and their staff as well as their ability to support the promotion of the kinds of transformational change which are likely to make a significant difference to the lives of people living in poverty and injustice. It is odd that those that argue for more evidence seem rather reluctant to admit that this is needed!

This is a debate we are keen to pursue further in the upcoming Big Push Forward conference on the Politics of Evidence.

January 24th, 2013 | 15 Comments

The evidence debate continues: Chris Whitty and Stefan Dercon respond from DFID

whitty_christopherYesterday Chris Roche and Rosalind Eyben set out their concerns over the results agenda. Today Chris Whitty (left), DFID’s Director of Research and Evidence and Dercon, StefanChief Scientific Adviser and Stefan Dercon (right), its Chief Economist, respond.

It is common ground that “No-one really believes that it is feasible for external development assistance to consist purely of ‘technical’ interventions.” Neither would anyone argue that power, politics and ideology are not central to policy and indeed day-to-day decisions. Much of the rest of yesterday’s passionate blog by Rosalind Eyben and Chris Roche sets up a series of straw men, presenting a supposed case for evidence-based approaches that is far removed from reality and in places borders on the sinister, with its implication that this is some coming together of scientists in laboratories experimenting on Africans, 1930s colonialism, and money-pinching government truth-junkies. Whilst this may work as polemic, the logical and factual base of the blog is less strong.

Rosalind and Chris start with evidence-based medicine, so let’s start in the same place. One of us (CW) started training as the last senior doctors to oppose evidence-based medicine were nearing retirement. ‘My boy’ they would say, generally with a slightly patronising pat on the arm, ‘this evidence-based medicine fad won’t last. Every patient is different, every family situation is unique; how can you generalise from a mass of data to the complexity of the human situation.” Fortunately they lost that argument. As evidence-informed approaches supplanted expert opinion the likelihood of dying from a heart attack dropped by 40% over 10 years, and the research tools which achieved this (of which randomised trials are only one) are now being used to address the problems of health and poverty in Africa and Asia.

The consequences of moving from expert (ie opinion-based, seniority-based and anecdote-based) to evidence-based healthcare policy, far from being some sinister neocolonial experiment, have been spectacular. To quote a recent Economist headline, ‘Africa is currently experiencing some of the fastest falls inOxfam africa campaign childhood mortality ever seen, anywhere’. It is a great example of the positive side to modern Africa the current excellent Oxfam publicity campaign (right) is all about. This success is based on many small bits of evidence, from many disciplines, leading to multiple incrementally better interventions. Critically, it also involves stopping doing things which the expert consensus agreed should work, but which when tested do not. It is no accident that one of the most evidence-based parts of development is also one where development efforts have had some of their greatest successes.

Proper evidence empowers the decision-maker to be able to make better choices. This is a good thing. In every discipline, in every country, where rigorous testing of the solutions of experts has started, many ways of doing things promoted by serious and intelligent people with years of experience have been shown not to work. International development is no different, except that the communities we seek to assist are more vulnerable, including to our bad choices.

Much of what we all do in international development has very limited evidence that it does any good  (in this it is no different from many other policy areas) – which is not the same as saying it is pointless. Rather we don’t know what is pointless. Some of our actions will work better than we think, much of it will work much less well than we hope, and some of it will be damaging the poorest without us realising it. In the evidence-light areas we just don’t know which are which.

We must have the humility to accept that we are all often wrong, however reflexive the practitioner, however deep their reading and experience and passion to do good. Evidence-based approaches are not about imposing a particular theory or view of the world. It is simply about taking any opportunity to test our own solutions in the best way available, using evidence honestly when it is available to inform (note the word) decisions, and when the facts change, changing our minds.

This honesty includes saying to decision-makers when evidence is methodologically weak, mixed or missing so they know they are on their own, unable to rely on (or make a claim on) the evidence. The worst possible solution, which we know Chris and Ros would also deplore, is using the social power of the ‘expert’ to imply we know the answer when we actually have no solid evidential basis for our opinion or prejudice.

A few false assumptions about evidence-based decision making

Some of those who express unease about evidence-based policy and practice seem to assume that it is always based on randomised trials and quantitative methodologies: not so. Methods from all disciplines, qualitative and quantitative, are needed, with the mix depending on the context. Randomised trials are one tool amongst very many, although a good one in the right setting. The argument that evidence-based approaches can “only apply in cases of individual treatment and not the wider community level” ignores over 30 years of methodology which has done exactly that, with very convincing results.

A sterile argument  between people who are on the one side believe that a  randomised trial can answer any question (they can’t), and people who do not appear to be aware of any  methodological advances since the 1970s except in their own narrow field is a depressingly familiar experience. We know this does not apply to Rosalind and Chris, but listening to people passionately critiquing methodologies they have not taken the trouble to understand does no good to anyone. This applies both to a randomista who seems to believe that all there is to social research is a few focus groups and in-depth interviews, and to people from a more qualitative social science background who would have trouble explaining the difference between cluster randomised and step-wedge design but assume both are irrelevant to social research anyway (both can be used to measure societal rather than individual effects).

It is tempting to take every point the authors make where we have concerns about their factual basis and logical framework but we will take just three.

“Evidence-based approaches are pre-occupied with avoiding bias and increasing the precision of estimates of effect”. On less bias – generally true. Please complete the sentence ‘More biased research is better because…’. On precision – no, incorrect, the range of situations where a more precise answer is a better answer is small.

One statement we would like to address head-on starts “Evidence-based approaches became linked to value for money concerns to deliver ‘results’…”. We agree- and this is a good thing. Doing a pointless thing, professionally delivered and passionately believed in, is always going to be poor value for money. Testing what works and what does not therefore is essential to value for money. More importantly, doing pointless things diverts very limited human and financial resources, in an ocean of need, away from those who could best use them- not what any of us are in international development to do.

Is it “technical approaches” on the one hand, and “power, political economy” analysis on the other?

Rosalind and Chris’ key criticism is that evidence-based approaches “deflect attention from the centrality of power [and] politics […] in shaping society”, and they offer “power analyses” as an apparent alternative to assessing rigorously what works. This creates a false dichotomy, as if a choice has to be made between a “technical, rational and scientific approach to development” and an approach that recognises politics and the role of power. It is easy rhetoric, but troubling and, if taken much further, even dangerous. Understanding power and politics and how to assist in social change also require rural indiacareful and rigorous evidence, and again, results are not simply what experts would have expected a priori. Recent studies on the positive impacts of female leadership quotas in rural India are for many of us rather surprisingly good news, even if one can fairly worry about its applicability in other settings, while the struggle to find systematically a positive impact of decentralisation and community-driven development programmes is important to internalise in our actions for change, and highlights the importance of understanding contexts and politics. In these cases, it is not a matter of just RCTs, but of rigour, and of combining appropriate methods, including more qualitative and political economy analysis.

Strong analysis of politics and power without offering much in terms of what can be acted upon is similarly unhelpful. They criticise an evidence-focused agenda by stating that “to act ‘technically’ in a politically complex context can make external actors pawns of more powerful vested interests and therefore by default makes them, albeit unintentionally, political actors.” But all actions by external actors will interact with political forces and vested interests. In many of the settings where development actors want to make a difference, power and political institutions are biased against the poor. Being able to act on strong evidence of what works in constrained political settings is crucial.

A reductionist and misinformed view of evidence as purely ‘technical’ or as being only about “what works” is unhelpful – it is also about generating evidence and understanding (and learning) on why interventions and approaches may work, including understanding the social, political, and economic factors that may enable or constrain success of different approaches. Far from the search for evidence pushing us in a ‘technical’, apolitical direction it has reinforced the importance of understanding and trying to tackle the underlying causes of poverty and conflict. There is agreement on the importance of politics and institutions in shaping growth, security and human development. However, the ability of external actors to influence institutions is much less clear and this is where DFID research is now focussed. Ros and Chris have misread the context – the commitment to evidence has opened up the space fundamentally to challenge conventional, technical approaches to aid.

Why it matters for international development

There are large areas of international development where decision-makers are largely flying blind – forced to make decisions purely on gut feeling and ideology not because they wish to because they have no option. Try making difficult decisions in education policy compared to health policy and the difference in usable evidence is dramatic – yet both are complex, social and context-dependent parts of human life. It is always puzzling when people say airily ‘health is easy’- it is not, and is an intensely political and social subject requiring interventions at societal level.

Today we can eradicate rinderpest in cattle and build bridges over the Zambezi based on rock-solid evidence from many disciplines, but do not have anywhere near as clear an idea how to reduce violence against women or tackle police corruption. All are great challenges with social dimensions but in two of them people have set about finding and testing solutions in a systematic way over many decades.

Having robustly tested evidence-based solutions certainly does not eliminate politics: the decision whether to build a bridge, what sort and where, is an intensely political choice – but at least those making the choice now have a fair assumption it will stand up- based on hundreds of years of incremental evidence. The evidence-barren areas in development are a collective, and in our view shameful, failure by us all in the academic and practitioner community. We should never excuse them with the feeble assertion that it is too difficult or complicated. Development is difficult and complicated – but the bases for making decisions will gradually improve if we are serious about improving it.

In conclusion, we collectively have the capacity to be able to give to our successors in every continent a far better basis on which to makeevidence based change placard their decisions for their lives than our generation have. To imply it is not worth trying to provide the best and most rigorous evidence to those who need to make difficult decisions because they will have other influences as well is like saying to someone going for a walk in dangerous mountains that they do not need a map because there will be many other factors that will determine where they go. That is true – but they are still less likely to fall off the cliff if they have one.

Where evidence is clear-cut we should be making that plain to decision makers – and where it is not we should say that as well, be honest about what is there and try to get better evidence for the future. That, in essence, is what evidence-based decision making is about – and all it is about. If the academic community is serious about trying to assist those working in the field (including in Oxfam), and above all empowering the most vulnerable communities to make the most informed possible decisions available for their own development, we should be putting our greatest efforts into supporting decision-makers to use the best evidence, and finding better methodologies in areas where we currently have very weak evidence. There are many, and this should be tackled as a matter of urgency.

Tomorrow, Chris Roche and Rosalind Eyben respond

January 23rd, 2013 | 22 Comments

When we (rigorously) measure effectiveness, what do we find? Initial results from an Oxfam experiment.

Guest post from ace evaluator Dr Karl Hughes (right, in the field. Literally.)Karl Hughes 3

Just over a year ago now, I wrote a blog featured on FP2P – Can we demonstrate effectiveness without bankrupting our NGO and/or becoming a randomista? – about Oxfam’s attempt to up its game in understanding and demonstrating its effectiveness.  Here, I outlined our ambitious plan of ‘randomly selecting and then evaluating, using relatively rigorous methods by NGO standards, 40-ish mature interventions in various thematic areas’.  We have dubbed these ‘effectiveness reviews’.  Given that most NGOs are currently grappling with how to credibly demonstrate their effectiveness, our ‘global experiment’ has grabbed the attention of some eminent bloggers (see William Savedoff’s post for a recent example).  Now I’m back with an update.

The first thing to say is that the effectiveness reviews are now up on the web.  Here you will find introductory material, a summary of the results for 2011/12, and some glossy (and hopefully easy to read) two-page summaries of each effectiveness review, as well the full reports. (You may not want to download and print off the full technical reports for the quantitative effectiveness reviews unless you know what a p-value is. With the statistically challenged in mind, we have kindly created summary reports for these reviews, complete with traffic lights….).  Eventually, all the effectiveness reviews we carry out/commission will be available from this site, unless there are good reasons why they cannot be publicly shared, e.g. security issues.

Plug over, I can now give you the inside scoop.  In the first year (2011/12) we aimed to do 30 effectiveness reviews, and we managed to pull off 26. Not bad, but our experience in the first year made us realise that our post-first-year target of 40-ish reviews per year was perhaps a bit overly ambitious.  We have now scaled down our ambitions to 30-ish, to both avoid overburdening the organisation and enable better quality control.

The issue of quality control, in particular, is critical because there are certainly opportunities to strengthen the effectiveness reviews, particularly in terms of rigour.  Currently, there is considerable interest in how to evaluate the impact of interventions that don’t lend themselves to statistical approaches, such as those that are seeking to bring about policy change (aka “small n” interventions).  See a recent paper by Howard White and Daniel Phillips.  We have attempted to address this by developing an evaluation protocol based on a methodology called process tracing used by some case study researchers.  However, we are struggling to ensure consistent application of this protocol.  Time and budgetary constraints, as well as inaccessibility of certain data sources, are – no doubt – key militating factors.  Nevertheless, we aim to improve things this year by more tightly overseeing the researchers’ work, coupled with the provision of more detailed guidelines and templates so they better understand what is expected.

While in no way perfect, we have perhaps had more success with the reviews of our “large-n” interventions, i.e. those targeting large numbers of people.  This is, at least in part, because we are directly involved in setting up the data collection exercises, and we carry out the data analysis in-house.  The key to their success is capturing quality data on plausible comparison populations and key factors that influence programme participation, and this has worked out better in some cases than in others.  We are also attempting to measure things that just aren’t easy to measure, e.g. women’s empowerment and ‘resilience’.  We are modifying our approaches and seeking to collaborate with academia to get better at this.  Despite their shortfalls, at £10,000-ish a pop (excluding staff time), we believe these exercises deliver pretty good value for money.

Humanitarian programming is not my thing, but I am particularly pleased with the humanitarian effectiveness reviews that critically look at adherence to recognised quality standards.  While there are some methodological tweaks needed here and there, the cohort of reviews presents an impartial and critical assessment of Oxfam’s performance and identifies key areas that need to be strengthened, e.g. gender mainstreaming.

So what do the effectiveness reviews reveal about Oxfam’s effectiveness?  While the sample of projects is too small to draw any firm conclusions, the results for this particular cohort of projects are – as one might expect – mixed. For most projects, there is evidence of impact for some measures but none for others.

LA 134510.jpgThere are, no question, some clear success stories, such as a disaster risk reduction (DRR) project in Pakistan’s Punjab Province.  Here, the intervention group reported receiving, on average, about 48 hours of advanced warning of the devastating floods that hit Pakistan in the late summer of 2010, as compared with only 24 hours for the comparison group.  Having had more time to prepare is one possible explanation why the intervention households reported losing significantly less livestock and other productive assets.  Oxfam’s research team is in the process of commissioning some qualitative research to drill down on this project to better understand what made it work.

Given Oxfam’s size and capacity to mobilise and make noise, it is no surprise that there is reasonably reliable evidence that many of the campaign projects have brought about at least some positive and meaningful changes, despite falling short of fully realising their lofty aims.  However, the results for several of the sampled livelihoods and adaptation and risk reduction projects are, quite frankly, disappointing.  Figuring out why these particular projects have not worked is just as critical for learning as is figuring why the Pakistan one did.

Whether their findings are positive or negative, I have to admit that I am impressed with how seriously the effectiveness reviews are being taken by senior management.  A management response system has been set up and embedded into the management line, where country teams formally commit themselves to taking action on the results.

That being said, the effectiveness reviews are in no way immune from internal controversy.  The random nature of project selection is perhaps the biggest sticking point.  While we do this to avoid ‘cherry picking’, inevitably some of the projects that are selected are small-scale and have little strategic relevance to the countries and regions.  Some are also concerned about how much time and resources the effectiveness reviews are sucking up.

We know that what we are attempting to pull off can be improved on a number of fronts, in terms of rigour, learning, and engagement and ownership of country teams.  And the good thing is that we are able to modify and improve things as we go along.  So any constructive criticism, advice, etc. is most welcome.

October 10th, 2012 | 19 Comments

What have the MDGs achieved? We don’t really know… Heretical thoughts from Matthew Lockwood

A second instalment in Matthew Lockwood’s series of valedictory boat-rocking blogs (his first was on fossil fuel subsidies) as he leaves the IDS Matthew_lockwood125Climate Change team for a new role in the UK energy sector. This time, he asks why the results agenda often stops short of being applied to the big picture stuff like the MDGs.

One of the interesting things about having come back to the international development field after some years away is the greatly increased emphasis on results, across all areas of activity, including not only projects and programmes, but also policy making, research, and advocacy.

Many people and organisations are interested in the results agenda, including the big foundations such as Gates, influential bloggers like Owen Barder, my boss Lawrence Haddad, and DFID’s Secretary of State, Andrew Mitchell. In his first big speech in office, in Washington in June 2010, Mitchell said “we’re also fundamentally redesigning our aid programmes so that they build in rigorous evaluation processes from day one.”

Like many others, I think aspects of the results agenda are important, reasonable and politically wise, although there are also some interesting critiques of the approach. But I also think that, if you really take it seriously, it throws up some challenges and dilemmas.

For me, this is clearest in the case of development’s big frameworks and policy directions. One prime example is the UN Millennium Development Goals (MDGs) and their proposed replacement with more development goals after 2015. As most readers will know, the MDGs are a set of human development goals, with subsidiary targets and indicators, formally adopted by the UN in 2000.

There is pretty broad agreement that progress towards meeting the MDGs is partial and uneven – some of the goals have been met or look very likely to be met, in some countries, while other goals (such as the target reduction in maternal mortality) may not. Asia, especially East Asia, has done better than Sub-Saharan Africa.

However, applying the results agenda to the MDGs is not simply a matter of asking whether the goals will be met. Rather, it is about asking whether the goals have been met as the result of the MDGs having been adopted. The purpose of having high level goals, including any that come after the current MDGs, is to create political will, the mobilisation of resources, policy change and delivery, all of which should bring about a positive change relative to what would have happened in their absence.

Many in the aid world would say that, of course, the MDGs have had a major impact, and that it is absurd to even raise the question. However, a rigorous assessment of the evidence suggests that it is actually quite hard to make a strong case.

mdg-iconsFirst, the evidence that the MDGs may have made a difference is, at best, mixed. The most comprehensive and rigorous independent assessment is by Andy Sumner and Charles Kenny for the Center for Global Development. They look for significant differences in outcomes and impacts before and after 2000, when the MDGs were adopted.

The clearest effects were on aid levels (which are not an ultimate impact but an intermediate outcome). Compared with the previous decade, official aid increased in the post-2000 period, but not as a proportion of rich country GDP. More aid went to the poorest countries, including to Africa. There was a small shift in the share of aid going to the social sectors, on which the MDGs tend to focus, and this happened soon after 2000.

There is plenty of evidence of the influence of the MDGs on policy discourse, if this is measured by mention of the goals or their presence in donor policy documents, PRSPs and developing country government goals. However, the effects on actual policy change are less clear. Sumner and Kenny find it “hard to detect a trend” in low income country government spending on health and education. They also find no trend in the quality of developing country policy making, as measured by the World Bank’s Country Policy and Institutional Assessment ratings.

On the actual impact indicators themselves – such as income poverty, malnutrition and mortality rates, educational enrolment etc – Sumner and Kenny’s most relevant assessment is whether progress was faster pre- or post-MDGs, and whether progress post-MDGs has been faster than what would have been expected based on past trends. Again, results are inconclusive. The data “suggest that in no case is there an obvious sign of a significant break towards faster progress since 2000. Nonetheless there has been somewhat faster global progress on income, primary completion rates, child and maternal mortality over the post-Declaration period”. A study by Fukuda-Parr and Greenstein of country level data gives a similarly mixed picture. The comparison with predicted rates of progress based on historical analysis implies slightly better than expected outcomes post-MDGs on primary education and gender equality in education, but worse on maternal mortality.

Second, there is the problem of attribution. As Sumner and Kenny put it, “even ignoring the very limited evidence of faster progress since 2000 in the average (unweighted) developing country, it is a considerable step from ‘more rapid progress’ to ‘the MDGs caused more rapid progress’”. In other words, bilateral aid may have increased somewhat, some indicators have improved, but how do we know that these changes are due to the MDGs, and not to some other factor?

It is not possible to know what would have happened in their absence. This is not a case of running randomised controlled trials across a number of interventions. And as Richard Manning points out, it is hard to separate out the potential effects of the MDGs from the environment that produced them.

In some areas, such as vaccination or primary education enrolment in sub-Saharan Africa, the links between the MDGs, the mobilisation

      Remind me, who's 'we' again?

and focusing of additional aid, and subsequent impacts seem convincingly close. But in others, the links seem less plausible, especially where there are also good alternative candidates that may explain changes in indicators better than the effect of MDGs. Poverty reduction in Asia, for example, is more likely to have been driven by the extraordinary period of sustained economic growth in China, than by a set of UN targets. It is also plausible that China’s growth will have pulled along a number of countries in its wake, including commodity exporters in Africa. The rapid reduction of poverty in Brazil is due in part to the development of social safety nets such as the Bolsa Familia. When I recently asked Romulo Paes de Sousa, Brazil’s former Deputy Minister for Social Development, and closely involved in the design of the Bolsa, whether it was the result of the MDGs, he dismissed this immediately, saying it was the outcome of a domestic debate that emerged from the minimum wage.

Yet despite the lack of clear, strong evidence of the impact of the Goals, and the difficulties of attribution, the MDGs are routinely hailed as a success. Most importantly, this success is asserted in the context of discussion about a new set of post-2015 development goals. When it was announced that David Cameron would be co-chair of the UN High Level Panel on post-2015 goals, Andrew Mitchell hailed the “huge progress that has been made through the Millennium Development Goals” and “the successes of the current goals”.

When challenged with the point that attribution is often difficult in cases such as these, and that you can’t compare counterfactuals, many proponents of the results agenda recognise the problem. However, their argument is that, in such circumstances, it is the duty of those proposing any particular approach to be explicit about their “theory of change” – that is, be explicit about the full chain of causal linkages you think is going to run from your intervention (here adopting international goals) and the impacts you hope for. Identify your assumptions. Assess the evidence for and against those assumptions, and weigh up the risks.

If done properly, this wouldn’t be just about ticking a box. The point of such an analysis should be to help understand how to make such goals more effective. It should look at why some goals were easier to meet than others (gender equity on education as opposed to access to clean water or reductions in maternal mortality) and in some countries than in others. It should look in a systematic and rigorous way in how the goals were used (or not used) and where there is evidence that they failed to lead to a result, explore alternative, potentially more effective “pathways to impact”.

The point here is not that the MDGs are somehow a bad thing, or that there should not be a new set of goals. In any case, it is not seriously in question that there will be further goals post-2015, of some form. Too much political capital has been invested in them for this to be the case, regardless of the ambiguity of the evidence base. The results revolution will not change the reality that some policies and initiatives are often inevitably driven by more than evidence, and that politics plays a major role.

Nor am I advocating a view that we should not try to measure impact or wrestle with the problem of attribution. What I am saying is that I think the example shows that really, really applying the agenda of results and evidence-based policy consistently and rigorously can be more difficult than the current discourse acknowledges.

Matthew Lockwood is a Research Fellow at the Institute of Development Studies at the University of Sussex. From October 2012 he starts work on a four year project on innovation and governance in the UK energy sector. 

August 31st, 2012 | 10 Comments

So where have we got to on Value for Money, Results etc?

Great posts, great comments. My head is now spinning as I try and disentangle some of the different threads that havecomplexity sign emerged over the last two days.

First: horses for courses. Some aid work is akin to Ros’ bathroom problem – linear, measurable, and suitable for a logframe + results approach. Other areas are emergent and unpredictable and a results approach would struggle. Say you had a programme in Egypt right now, and were wondering what to spend your money on. You could reassure your donors and supporters by opting for a measurable bathroom problem, say building schools, but that would be to ignore the historic opportunities for change presented by the social and political upheaval in that country. But how could you support that with any likelihood of proving impact or attribution? Tricky, but a clear risk that the results agenda will drive you in the wrong direction. Could senior management, as Jonathan suggests, create a situation where some programmes are assessed on results and others on relationships? In the current climate, it’s easy to imagine that the latter category would end up being starved of funds.

Second: upwards v downwards accountability. Can a results agenda strengthen both -  can countability improve accountability? (thanks for the soundbite, Sceptical Secondo). Claire, supported by Penny Lawrence and Alex Jacobs (with an excellent link to some practical examples) thinks it can.

Third: theory v practice. Claire’s right, I think, that in theory, a results agenda can be built on the perceptions of beneficiaries, improving quality and accountability. But Ros has spent a lot of time looking at what happens once all these ideas are implemented on the ground, and what looks good in the thinktank (and apparently in the NHS) may not survive the collision with reality, where staff are overstretched, working to tight deadlines and have little time for innovation or risk-taking. When I talked to Oxfam’s number crunchers about this exchange, they said they would love to take part, but were simply too busy generating the numbers needed to satisfy our donors!

Fourth: Trust. Ros rightly raises this, but trust between whom? The results agenda aims to build trust between northern publics and aid agencies, which is of course vital if aid spending is to continue to rise. And given that NGOs endlessly tell corporates and governments that we have moved from a ‘trust me to a show me agenda’, it would be pretty hypocritical to say the same shouldn’t apply to us. But what about the trust between aid workers on the ground and the partners they work with? Ros worries that that trust will be eroded by a crude results focus (and Alice Evans’ example from Zambia suggests she’s right), whereas Claire seems more concerned about using measurement to tap directly into the lived experience of poor people (hard, if that group is not as easily identifiable as NHS patients – back to my Egypt example).

Fifth: It all comes back to people, in particular the skills and motivations of the people who work for bilaterals, NGOs and all the other bits of the aid industry. If you have brilliant, motivated staff dripping with a sense of vocation, then they can probably make either a results agenda or a relationship-based approach work just fine. If you have demotivated nine-to-fivers who see this as just another job, then they will find a way to tick boxes and achieve little, whatever approach is adopted. I guess the interesting question is about those in the middle – what works best with normal staff doing the best they can, while coping with all the other pressures in their lives?

Which brings me to my final conclusion. I assume that the value for money people would never dream of asking us to take their ideas on trust. What are the results of the results-based agenda, compared to other approaches? What would be the best way to evaluate the evaluators? Is the balance of evidence different for say, work on women’s empowerment, governance, livelihoods or health and education? Lots of work for researchers over the next few years. [update: Ben Ramalingam came to much the same conclusion a few months ago on the Aid on the Edge of Chaos blog]

So thanks everyone, I’m now better informed, but still on the fence. As are the rest of you, judging by the pretty even split on the poll. It stays up til Monday, so not too late to vote…..

Update: in a similar vein, someone just put this up on twitter [h/t Henry Northover and Ian Thorpe]:

“Can I pay for Nancy Birdsall’s new book on Cash on Delivery aid after I’ve tried it out to see if it really works?”

March 17th, 2011 | 3 Comments

If not results, then what? The risks of not having a results agenda

The ODI’s Claire Melamed replies to yesterday’s guest post from Ros Eyben: 673-claire-melamed

“Ros Eyben suggests that instead of a results agenda, we should rely on good relationships to deliver good aid.  And indeed, if all relationships were good, and all the people involved in making decisions about aid were thoroughly well-informed, open to new ideas, flexible in their approach, lacking in ego, adept at dealing with cultural and religious differences and aligned with the needs and priorities of poor people, that might just work. 

But just supposing, for a moment, that aid bureaucracies aren’t all like that, let’s think about the risks of not having a results agenda.

If you don’t define in advance what the objectives of an aid programme are, you leave it up to the managers who make the decisions and the politicians who guide them to impose their own values and prejudices onto the aid programme.  Of course if they could all be trusted to make the right decision, there’s no problem.  But evidence suggests that might be over-optimistic.  Exhibit A: attempts to fund the building of a dam in Pergau that had nothing to do with poverty and everything to do with arms sales.  Exhibit B: the ideological pursuit of structural adjustment programmes in the face of substantial evidence of the harm they were causing.

A focus on results can help to rebalance inequalities of power. When the Labour Government created the National Institute for Health and Clinical Excellence (NICE) in the UK in 1999, to ensure that evidence about value for money and effectiveness was used in deciding what drugs to prescribe in the National Health Service, pharmaceutical companies were among the most hostile to the idea. Naturally, from their point of view, they preferred their own marketing ‘evidence’ to help doctors make prescribing decisions. 

Actually, of course, leaving all decisions about prescribing up to doctors – informed by partial evidence – led to inequalities (the dreaded ‘postcode lottery’), and to millions of pounds wasted on ineffective treatments. NICE’s role in bringing together evidence from clinical trials (which included patients’ own assessments and valuations of changes in their health) with the costs of treatment, has started to improve value for money in the NHS and also to take more account of health benefits (or lack of them) from patients’ own point of view. 

A results agenda, as long as the right results are being pursued, can help to rebalance inequalities of power and make the actions and decisions of the powerful more transparent. It helps people to know what the objectives of decision makers are – and so to argue that they should be different, if that’s the case; and also to hold people to account for their success or failure to meet those objectives. Without measurement, there can be no accountability. 

The real question is what results we are looking for, and how to measure them. Of course if donors want to do the wrong things, and measure the wrong things, they won’t get good results. But pointing to examples of the wrong way of using results and saying, ‘so let’s not measure results’, seems to me as big a folly as the, sadly all too popular, pastime of pointing to the latest example of unsuccessful aid and saying ‘so let’s give up on aid altogether’. 

So if the numbers of polio vaccines isn’t the right result to ask for, then let’s look for something that is a better measure of the strength of health systems. And instead of counting the length of roads, let’s measure the strength of solidarity in communities – that’s doable. 

The results agenda is actually a huge opportunity for people who care about relationships, trust, empowerment, rights and complexity to find ways of getting these things firmly integrated into how we measure development.  Then they’d be part of the mainstream.

These things can be counted. There are approaches developed in the UK’s National Health Service, for example, which allow patients to say how much they value different health outcomes, like the absence of pain or the ability to move about normally. Research shows that the values that ordinary people attach to different outcomes are different to those of even the most well-meaning professionals – which should be a warning to us all not to make assumptions about what people want. This information is turned into numbers and used to allocate funding and to measure results.  Imagine if we actually knew what poor people wanted and if they were getting it?  Everyone who works in development should surely admit that we don’t know as much as we should about if we are actually delivering ‘value’ as the recipients of our efforts would define it. 

We should be welcoming the focus on the results, because a world where we don’t know the results of our actions is not one that any of us would want to live in. This agenda should be used too, to  encourage a focus on what results poor people themselves (or, more likely, poor women, poor men, poor people in cities, in rural areas and so on, who would all have different priorities) most want to see, and how they’d define ‘value’ or ‘effectiveness’. 

Information is power. I say, don’t fear it. Use it. ”

Claire Melamed is the Head of the Growth and Equity Programme at the Overseas Development Institute

Update from Duncan: In a desperate attempt to stem the tide of consensus and mutual respect sweeping over the comments, I’ve put up a poll to the right of this post that allows only a yes/no answer to the question of whether the current focus on Value for Money is a Good Thing. ‘Sometimes’ ‘Maybe’ ‘It depends’ type answers all forbidden!

March 16th, 2011 | 20 Comments

‘Stuff happens’: the risks of a results agenda. Guest post from Rosalind Eyben

A few months ago, I blogged about the risks associated with the aid industry’s current overriding obsession with audit/value for money/results (pick your term). Since then, that debate has been swirling around both on this blog Ros Eyben portraitand (more importantly), in aid and development circles in many countries. So to help it along a bit I’ve asked two people who think about this a lot more than I do to set out some competing arguments. First up is Ros Eyben, who got a big and largely positive response to her recent challenge to the dumber/more extreme varieties of value-for-moneyism. Tomorrow the ODI’s Claire Melamed responds. Please join in the debate.

“The UK’s development ministry (DFID) has just completed a review of its bilateral aid programme. The Secretary of State for International Development, Andrew Mitchell has ‘set out the results that UK aid will deliver for the world’s poorest people over the next four years’. DFID will be more ‘hard-headed about making every penny count’. Its press release highlights results such as 11 million more children at school and 50,000 fewer women not dying from having babies. Digging into the review’s report, you will find numbers relating to DFID’s other aims, including wealth creation and tackling the root causes of conflict. Here, DFID is more modest: 50 million people with the means to help work their way out of poverty, rather than creating millions more jobs as some enthusiastic DFID country offices apparently offered to achieve. How can a government (let alone a foreign aid agency) deliver jobs? Likewise, DFID is not going to reduce the number of conflicts in the world but instead help citizens hold their governments more accountable.

When we look at the details, DFID’s plans seem pretty sensible. But the press release worries me. Explaining to the British public how UK aid delivers value for money – promising to educate more children than those we educate in the UK, but at 2.5% of the cost – must surely influence how DFID thinks and works. I am in charge of redecorating our bathroom while my partner is away. The paint is peeling and there is mildew on the ceiling above the shower. To demonstrate I got value for our money I will get two quotations for the redecoration. Many donor governments are treating the complex problems of poverty like my bathroom. They contract a Third Party Operator to deliver a result pre-determined by DFID. At the end of three or four years, there is an evaluation to check on results before paying the contractor.

Sometimes DFID’s bounded problem-approach to change (as typified by the logical framework) is going to work. But there are major concerns about the institutional and financial sustainability once the intervention ends, if these have not been addressed as an integral part of the design. By 2006 the global polio vaccination campaign had successfully eradicated polio from all but four countries, yet by 2008 it had reappeared in nineteen additional countries. In the drive for results, insufficient attention had been paid to the national health systems needed to keep polio at bay.

To be able to count exactly how each penny or Euro of aid money gets spent, donor governments are risking not making any difference at all. They can show how many kilometres of roads they have built or numbers of babies vaccinated as compared with before they started the projects. But such facts reveal little about how the change was achieved and what can be learnt for future policy and practice. End-of-project evaluations are no substitute for continuous learning and adaptation of approach. Donors are ignoring lessons long since learnt: without local people empowering themselves to change those less tangible factors that cannot be counted, once donor money stops the roads will crumble away and the next generation of babies will not be vaccinated. These inadequate measures of assessment – and the effect of such measures on the design of aid – risks donor governments wasting, instead of securing ‘value for money’.

dilbert auditing

Eventual outcomes are often very different from what the logical framework required. Stuff happens. Power, history and culture shape the multiplicity of relationships and actors influencing any aid intervention. It makes more sense to design aid to recognize this. Experienced staff and consultants know it. But they are being forced to misrepresent reality in order to keep things simple for the taxpayer. They have to work with complex problems – such as why maternal mortality rates refuse to go down – as if they were bounded problems like my mildewed bathroom. In a largely unpredictable and dynamic environment, rather than choosing a single ‘best option’, a more value-for-money might be achieved by financing two or more different approaches to solving a complex problem, facilitating variously-positioned actors to implement an intervention according to their different theories of change and diagnoses and consequent purposes.

Aid bureaucracies have never recognised that effective aid depends on people and the quality of their relationships with each other. Sheela Patel of SPARC, an Indian NGO that supports slum dwellers federations has written that when SPARC was founded in 1984

‘ Donors gave money to us because there was a sense of trust. These funders did not set our priorities; communities of poor people did….. we were given all the space we needed. Consequently, SPARC and its partners now operate in nine states of India and help some 750,000 households….. I cannot imagine donors in today’s world granting an organization like SPARC the kind of latitude it required in its early years. Instead,[they] have become more focused on developing portfolios of projects, managing risks, and producing outcomes rather than on listening to communities, healing deep inequities, and supporting innovation’.

The origins of the results agenda lies in a mistrust that eats like a cancer into aid agencies’ capacity to make a difference. I am not convinced the emphasis on results will solve the problem of trust. On the contrary, it risks making things worse. The results rhetoric gets exaggerated by bureaucratic systems and by those middle level managers with little country level experience who are forcing grantees and development partners into straitjackets that constrain them from helping transform the lives of people in poverty.

We aid practitioners must start building trust. Steps in the right direction include paying attention to the inequitable power relations, including our own behaviour, which keep people in poverty; being modest about what any purposeful intervention can achieve; and communicating simply with taxpayers about complex realities.

Rosalind Eyben is a Fellow at the Institute of Development Studies and former Chief Social Development Adviser at DFID.

March 15th, 2011 | 8 Comments

Powered by WordPress | Design modified by Eddy Lambert from the Blue Weed theme by Blog Oh! Blog | Entries (RSS) and Comments (RSS).