The WSJ On The Lancet Study
Hat tip to our good friend too many steves for alerting me to Steven Moore’s Op-Ed in the Wall Street Journal on the Lancet study:
With so few cluster points, it is highly unlikely the Johns Hopkins survey is representative of the population in Iraq. However, there is a definitive method of establishing if it is. Recording the gender, age, education and other demographic characteristics of the respondents allows a researcher to compare his survey results to a known demographic instrument, such as a census.
Dr. Roberts said that his team’s surveyors did not ask demographic questions. I was so surprised to hear this that I emailed him later in the day to ask a second time if his team asked demographic questions and compared the results to the 1997 Iraqi census. Dr. Roberts replied that he had not even looked at the Iraqi census.
And so, while the gender and the age of the deceased were recorded in the 2006 Johns Hopkins study, nobody, according to Dr. Roberts, recorded demographic information for the living survey respondents. This would be the first survey I have looked at in my 15 years of looking that did not ask demographic questions of its respondents. But don’t take my word for it–try using Google to find a survey that does not ask demographic questions.
I knew this study was bunk when I first saw the headline: it was beyond belief that such a level of carnage could go unnoted. I didn’t have the technical expertise to actually do any ‘debunking’, however, so I could only make appeals to common sense. Fortunately, those who do know about these matters are now coming forth and showing the survey to be what it clearly was all along: a severely flawed piece of work whose release was timed to influence the midterm elections…

It’s like Ebonics for white wingnuts. Innumerate and proud of it.
Moore further says:
…[T]he key to the validity of cluster sampling is to use enough cluster points. In their 2006 report, “Mortality after the 2003 invasion of Iraq: a cross-sectional sample survey,” the Johns Hopkins team says it used 47 cluster points for their sample of 1,849 interviews. This is astonishing: I wouldn’t survey a junior high school, no less an entire country, using only 47 cluster points. Neither would anyone else. For its 2004 survey of Iraq, the United Nations Development Program (UNDP) used 2,200 cluster points of 10 interviews each for a total sample of 21,688. True, interviews are expensive and not everyone has the U.N.’s bank account. However, even for a similarly sized sample, that is an extraordinarily small number of cluster points. A 2005 survey conducted by ABC News, Time magazine, the BBC, NHK and Der Spiegel used 135 cluster points with a sample size of 1,711–almost three times that of the Johns Hopkins team for 93% of the sample size.
Want to tell us where he’s wrong?
This reminds me of the wild claims of disenfranchised black voters in Florida after the 2000 election. The numbers claimed were in the tens of thousands, yet no single verifiable disenfranchised voter was ever produced. When my liberal friends would start ranting about this “travesty,” I would always say, “Did you see the interview with some of those voters on Chris Matthews the other day?” They would, of course, say, “No.” I would reply, “I know. It was a trick question. Not a single disenfranchised voter has been identified. Don’t you think if there were truly tens of thousands of such voters, the media would be presenting and interviewing some of them incessantly?” There replies were always amusing.
Note to self: don’t let the facts get in the way of a good story, when confronted with such facts don’t disavow or argue, simply change the subject.
655,000 excess deaths started as not credible, has evolved to laughable, and soon will be unspoken.
What we do know is they published their “report” about 3 – 4 weeks early to have any impact on their real target: the midterm elections.
Katrina also saw such bilious fiction that even a cursory knowledge of the demographic facts showed to be, not erroneous, but simply lies. The first commentor is a gem. I don’t know where the retrograde, reactionary Left developed this juvenile hostility to racial minorities but it comes out so consistently and tediously it truly is a marvel. The ignorance, arrogance and stupidity demonstrated so consistently by this crowd is something to behold. Is this the mindset at the Lancet that allowed, no, demanded that this propaganda masquerading as a scientific study be forwarded under that centuries old and revered masthead? Yes. And as always, the Left subverts and cannibalizes in moments insitutions built over lifetimes by better men. Better by far.
surely no one could be stupid enough to continue to defend the report…
except AV.
Usually people resort to links that support them, andy went straight to baby talk. Keeping it easily understood for his other progressives.
What, now you want substance? After a week of wallowing in ignorance? Can’t I be grandfathered in, relying on my “gut feeling” that the article is a “crock” because Stephen Moore is a known mathematical snake-oil salesman?
How about I cite an expert to disprove Moore:
Ooops. The expert who wrote that is… Stephen Moore.
Ok, snark off. Simply stated, the sample size and number of clusters must be large enough to be statistically valid, but are not dependent on the size of the population or geographic area being studied. Once the threshold of statistical validity is met, the selection of sample size and clusters is determined by the degree of precision you want the study to have.
In other words, the sample size in the Lancet study is large enough for the result to be accurate, in that the chances of the estimate being less or more than the low and high bounds is infinitesimal – 2.5% on either end, or %5 total (as a rule of thumb, that margin of error can be determined by dividing 1 by the square root of the number of people in the sample, not the total population).
But the study is obviously not precise, as evidenced by the wide confidence interval. The authors determined 1) that the risk to the survey team was not worth narrowing that interval and 2) that the question being answers – are more people dying post-invasion – did not depend on a specific point estimate (unlike the UN study, for example, where the numbers are used to determine distribution of resources to areas of high health risk).
I horked the em tag; didn’t mean to italicize that whole sentence.
I’m sure there are many other reputable studies that rely on the same methodology. Name one.
Using the same methodology as the lancet I can disprove global warming, and show how the bush economic policy is leading to trillions in surplus.
I mean if tax revenue is up 11% over a years time( a longer span than the lancet’s) , and continues at 11% for the next five years.
September was below average for the 20th century.
and exit polls showed kerry winning 04 in a landslide…
You are pointing to ne, of an infinte number of detractors. Find me someone with a rep who is backing it.
Andy, fine, you obviously know something about statistics, and I have been the first to admit that I know very little (though I took a class in college as a freshman in 1986-87!). I’m curious – and not snarking here – about whether you read Iraqi Body Count’s response to the Lancet study…I’m also interested in whether you do, in fact, believe in the Lancet’s numbers, despite their absurdly high estimates. In other words, be an ignoramus with me for a minute and tell me – does it pass the smell test?
What would you accept as “reputable,” MTL? The “30 by 30″ sample design (30 clusters with 30 samples each, the same used in the 2004 Lancet study) is the standard used by organizations such as the World Health Organization and National Institutes of Health in epidemiological studies in dangerous or developing nations.
It’s also used in everything from tax collection and marketing to grassroots activism. See for yourself.
It’s not a bad thing to be skeptical, Mark. And the first – or second…I forget – rule of statistical analysis is to be wary of outliers.
But in this case the math and methodology is solid. The only significant problem I’m aware of is the total population estimate for Iraq used in the study. Again, this doesn’t affect the validity of the sample or cluster sizes, but it would affect the “excess death” calculation (after minus before the war). Using a competing number, some have noted the estimate range would shift by about 75,000 fewer casualties.
The IBC response was well-intentioned but…ridiculous. The IBC and Lancet data sets are apples and pepino melons. Media reports have always undercounted large-scale war deaths, on average by half but in some cases (Guatemala, for example) by factors of anywhere from 5 to 20 – and the worse the violence gets, the more is missed because reporting becomes exponentially more difficult. Critics have been citing the improbability of missing high daily death totals, but the flip side of that is it is impossible to report a large number of incidents involving a small number of victims.
Anyway, back to the IBC response. The obvious rejoinder to the notion that the Lancet study implies “incompetence and/or fraud” on the part of health officials (other than the fact that any bureaucracy worth its salt would rationally underreport to protect its legitimacy) is that the third and likeliest possibility is incapacity. That is, the system for reporting deaths to a central authority is 1) extremely difficult to operate and 2) not a high priority in a dangerous and chaotic environment. As others have noted, the numbers in the US and UK aren’t that accurate in the best of times.
Which leaves us with the core objection: “someone” would have noticed this number of casualties. My question (or answer) is, given the above, how would they? The scale of these things truly defies common sense. Certainly, a lot of someones must have “noticed” the decimation of their neighborhoods. Many may have compared notes with relatives across town. But there is simply no way of judging the true extent until someone conducts a systematic count… or makes a statistically educated guess.
So, yeah, my natural, “gut” reaction is to disbelieve large numbers – which are, after all, abstractions by definition. This is true in many areas of life, like the fact that I could theoretically become a millioniare through relatively small contributions to my 401K account. But in this matters my gut is wrong almost without fail.
Lancing The Lancet
In today's Opinion Journal, Steven Moore takes a hard look at the much discussed Lancet study that claimed astronomical numbers of civilian casualties in Iraq. As Moore says, it is not that anyone is defending any civilian deaths, just th…
Andy, fair enough…I’m not convinced that sound methodology trumps common sense every time, but hey, I don’t want to keep beating the same dead horse – I save that for the main page! Anyway, I appreciate your take and your explanation…
What struck me in reading the description was the method of finding the 40 households in each of the 50 clusters; it seemed to be a decidedly non-random process. The first household was found by what appears to have been random selection, but after that 39 households are chosen, one after another, because they are adjoining to the previously accepted household. How do the researchers know that those in household n were not placed there because it was expectable that that household would be interviewed?
In some ways, it feels to me like they’ve used 50 (well, 47) random points, not 2000 (1847).
I think it would be interesting to see how the projected deaths from the first, first three, and first ten households at each cluster point compare to the total at that point.
My work in statistics was a required two quarters, a quarter of a century ago, and I have not used it much since. It does seem to be an extraordinary claim.
Those are good questions, htom, but they’re addressed in the study. There will be a test, you know.
I gotta run, but here are quick answers:
1) It wouldn’t be a “cluster sample” if the households weren’t clustered. The important thing is that each household interviewed be adjacent to the next (or, if no one is home, the house adjacent to that one. Otherwise, the surveyors introduce bias into the sample by choosing the households. (There’s a longer, boring answer to all this I’ll try to get to later)
2) there had to be proof that the subject had lived in the household for 3 months prior to the interview
3) there were 50 clusters; three were compromised because of a miscommunication in the field and thrown out (this widened the confidence interval slightly, but didn’t destroy the necessary representative sample)
4) It would indeed be interesting to see the raw data; they released it for the first study and most likely will for this one. We’ll see.
Just to add to Andy V’s comments, see John Quiggin’s discussion of casualties due to air strikes In brief, 13% of reported deaths were ascribed to air strikes. As John argues, some of those were probably misidentified (and were actually due mortar fire or whatever). Say the actual number is 10%. A year ago, the US was flying 120 air strikes/month. It’s probably higher now. At that rate, it’s easy to estimate 15,000 people/year killed by air strikes alone. Multiply that by 10, and you get 150,000 deaths/year, without even breaking a sweat.
[Just to be clear, the rate of air strikes has not been constant during the 3 1/2 years of war. The US flew thousands of sorties during the initial phase of the war. The rate dropped off sharply after that, but has been steadily rising for the past year and a bit.]
There could very well be something systematically wrong with the Lancet study. But it can’t simply be dismissed because the numbers are “wildly implausible.” They’re high, but not wildly out of the realm of plausibility.
The “critics” of the study will have to work a little harder.
None of which answers the question of how so many recorded deaths (80%+ produced death certificates) have gone unnoticed until this study. That, to me, is the implausible part. This war is widely criticized, lots of people want to discredit it as a means to stop it, why have they not noticed such extraordinary numbers of deaths? I understand the fog of war argument, but given the magnitude of death reported in the study that argument loses credibility.
It depends a lot on what you mean by “recorded”.
Death certificates are issued by doctors, not by some government office. They are supposed to be filed with the government but, in Iraq, the system for doing so has completely broken down.
If you ask Iraqi officials how many people have died (and they have been asked), they will tell you that they have no clue as to the number.
Since the Iraqi government is not collecting statistics, the only thing you have to rely on are press reports. If I asked Mark to estimate (based on press reports) how many people died in traffic accidents in Austin in the past 3 years, I think he would have a hard time coming up with an accurate number. And that’s despite the fact that conditions for reporters are infinitely better in Austin than in Ramadi.
Jacques, I actually don’t put too much credence in breakdowns of causes of death beyond something like “most people were killed by gunshot, with explosions being the second largest factor” where this study is concerned. There’s the misidentification issue, but an even bigger challenge is trying to parse ever smaller subsets of the original sample, which really are statistically insignificant. These issues beg for further examination.
too many steves, it’s an important concern, but I think you’re reading from the wrong end of the equation. The study raises big questions about the state of the Iraqi health system, not the other way around.
But more to your point, the the fog of war isn’t the primary reason people haven’t noticed the magnitude of death (again, people have obviously noticed it in their own families and neighborhoods). It’s because it is not possible to accurately measure the magnitude on a national scale without active sampling and statistical analysis. If the system for centrally collecting and recording death certificates was working reasonably well, the numbers would be closer to the Lancet range, but they still wouldn’t be accurate. That’s just the nature of the beast with matters of this scale. The numbers you and I take for granted about death in the United States were generated by mortality studies, not by simply counting death certificates.
The real question to my mind is: Why aren’t mortality surveys being done in Iraq as a matter of course? “This American Life” did an amazing show on this question last year.
BTW, I’m not claiming it’s impossible that the Lancet study is fundamentally flawed. But the methodology is rock solid and is there for anyone to see. As others have noted, that only way the estimate range could be wildly off is systematic fraud on the part of everyone involved. And there’s zero evidence for that. In fact, the authors have been practically begging governments, media and NCOs to conduct their own studies for comparison.
No, that’s not true; the other ways that the study could be flawed are (a) a non-representative sample, and (b) a wrong estimate for Iraq’s population. I don’t have a link handy, but I’ve seen it discussed that given how poor Iraq’s deaths are recorded, as you guys admit, how much weight should be put on its census capabilities? There is no doubt that many, many people with the means and the ability have fled. How many? I don’t know…but it’s pertinent to the question at hand, isn’t it?…
You’re absolutely right about the statistical significance issue. The percentage killed by air strikes could be 5%; it could be 20%. We’d be hard-pressed to tell from the data.
My only point was that, in the case of air strikes, we have an independent way to guestimate the number killed. And that number is substantial. If it’s true that air strikes are responsible for a relatively small percentage of total deaths (even if we don’t know precisely how small) then we get a guestimate of the total number of deaths that is, at least, in the right ballpark of the Lancet figure.
This isn’t an argument about the validity of the Lancet study; it’s a response to those who would say that the Lancet number is absurd on its face. The Lancet number could be right, it could be wrong, but it’s not absurd on its face.
Well, I still say if your data flies in the face of all known reality, then your data is wrong…but you know where I stand. I’ll be over here in the corner with the other innumerate rubes…
Mark, a) can be deduced from the report (unless the authors falsified data and/or lied about their methodology); that’s what peer review is all about – both in the pre-publication sense and the hashing the study is now undergoing. I don’t see any fundamental anomolies (not that anybody gives a damn what I think), and I’m not aware of any actual peers reporting any. Andrew Gellman, who literally wrote the textbook on these matters, has some minor quibbles but doesn’t challenge the sample.
I addressed b) briefly above. I’ll provide a big, complicated answer about just how accurate the census figures are if you want, but it probably wouldn’t be of much use to the discussion here. For now I’ll limit it to this: A census, unlike the IBC tally and counting death certificates, is a form of active sampling. It is systematic, seeking out people to count, and includes statistical methodologies to account for those it misses. It can also be compared to other regular statistical estimates by the UN, NGOs and other governments to judge accuracy. The IBC and “CDC” counts aren’t wildly off base just because there’s a war on (or, as the IBC people are claiming, implied fraud and incompetence). It’s because those are inherently poor methods of counting. What’s more, we can never really know how much inaccuracy should be attributed to the war, fraud, inertia or whatever, because of the passive, unsystematic nature of the count.
Anyway, from what I’ve read there are indeed some doubts about Iraqi census figures that could impact the Lancet range (by about 75,000, as noted above). This was and is part of the peer review process. From what I’ve seen, the number used in Lancet was statistically reasonable and defenseable.
Let me also say that I don’t believe that accepting the Lancet range as accurate, if not precise, is necessarily “anti-war.” I think one could just as easily argue that the US effort should be redoubled to stop the carnage and prevent it from sliding into complete chaos. I don’t happen to agree that that is possible, but that’s a different argument. In any case, the report is a wake-up call, and must be grappled with.
I’ll be over here in the corner with the other innumerate rubes…
Alright, I apologize for my ad hominem rudeness. This is an emotional issue for me, as I’m sure it is for you. And snark is my nature. But I shouldn’t have gone down that path.
What really grates me is not innumeracy or the criticism of the report as much as blithely dismissing it with little more than a wave of the “common sense” hand. And if one doesn’t believe they have the expertise to evaluate the study’s claims, wouldn’t that also warn against accepting Moore’s response at face value? I’m not trying to claim the Lancet report or any other like it is sacrosanct; but, goddamit it, it should be engaged on the merits.
One doesn’t have to be a “expert” in probability theory to see Moore’s core argument as bunkum (I’m not a professional, by the way, if that wasn’t already apparent). It goes against the most fundamental concept in statistics, if not math in general, and could be answered with a five-second Google search. It took Andrew Gelman in the comments to the post I cited one sentence to blow Moores’ entire editorial out of the water: “The total population of the country is essentially irrelevant for the sample size needed. This is a basic statistical principle (it’s the sample size (n) that matters, not the population size (N)).”
That such 2+2=5 nonsense is taken seriously for even a second – what Frankfurtians refer to as “bullsh**” – is what gets us “reality-based” nutrooters’ panties in a bundle.
Andy, nothing to apologize for…I was actually saying that with tongue-in-cheek…it was an amused comment, not a hurt one…
Hey, we’re all on the same side here, ultimately…if 650,000 have died or 150,000, it’s much, much worse than I hoped at the end of major combat, and though I think most of those deaths (at least post-2004) have been at the hands of terrorists/’insurgents’/fanatics, we should have never allowed the security situation to spiral so out of control…
As far as the numeracy goes…believe it or not, I have a passion for math…I love reading ‘general reader’ books that focus on the subject (right now, in fact, I’m reading Newman’s 4-volume World of Mathematics and another book from Oxford University Press called Mathematics: The Loss of Certainty – I tend to read about eight books at a time, because I have so many that I can’t stay focused on one).
My problem? Although fairly accomplished in high school, I didn’t pursue it in college, only took the easiest math I could take to get my credit hours (my degree is in Economics, not Econometrics), and as a result have only an easily-confused layman’s apprehension of many of the topics…
In know, the idea that the same sample size could be used to measure characteristics of populations of 10,000 and 10 million is extremely counterintuitive, but it is in fact really, really simple. The nutrooters are floating around the quite silly analogy of two pots of soup, one much larger than the other. You only need one taste to just the taste of either pot, as long as it is well mixed. Which of course begs questions about mixing and dirty spoons…
Daniel Davies, that fat Welsh stockbroker who predicted this mess, offers another, more pointed analogy and uses math to turn “common sense” on its head:
And another:
Mark, I’m not sure if you aware of the debate going on over at Blogcritics. Comment 80 is very interesting. My boss, a stats major, has had a quick look at the report for me and says take it with a grain of salt. He makes some of the same points Ernest makes.