Know Your Source: Estimating Health Trends by Connecting Data Dots
As many of you know, I was in London last week to launch the Global Burden of Disease Study 2010. GBD 2010 provides global benchmarks for more than 300 diseases, injuries and risk factors from 1990 to 2010. It was the result of hundreds of researchers worldwide gathering data from vital statistics, censuses, surveys and other sources to create the most comprehensive analysis to date on global levels and trends in health.
Using advanced analytical methods, they corrected for misclassified deaths and conditions. They also filled in gaps in the data. A lot of people make the leap to think that in high-income countries, a large study of health trends like GBD 2010 must rely solely on “hard data,” while in low-income countries the numbers must be mostly estimates. In fact, researchers are making estimates everywhere because, for the reasons I have explained in my previous posts, the data from vital statistics, censuses and surveys all have their limitations. Estimates have limitations, too, of course, and I will talk about those later.
First, to understand what goes into a good estimate, here’s a window on what went into creating GBD 2010. There were 486 authors on the papers that were published in The Lancet last week. They came from 302 institutions in 50 different countries. Starting in 2007, those researchers set out to completely rethink the GBD process that was created in the early 1990s. They gathered more data than ever before. They used vital registration systems, surveys, censuses, and verbal autopsies. (That requires its own post, but you can read a little bit about them in this great piece by Tom Paulson.) They also performed a meta-analysis of available randomized controlled trials.
They created a database that covered the big health problems that people tend to think about – like ischemic heart disease, AIDS, and diabetes – and the smaller issues, like obscure diseases that are hard to pronounced or once fearsome diseases like polio that are now on the verge of extinction. The researchers created set of criteria for which data should be included in the final analysis and which should not. For example, if a study was not rigorously conducted or was too specific to one place and one time to be broadly applicable, it was excluded. Then the team developed new analytical tools to fill in gaps in the data for countries where health data are sparse. They tested those methods by using them to make estimates in areas where health data are more readily available – such as the United States or Japan. This is called “out-of-sample predictive validity,” which I think should be the name of the next They Might Be Giants album.
So what about limitations?
Estimates often are forced to bridge some pretty wide gaps. One of the chief architects of GBD 2010 is a computer scientist turned global health researcher named Abraham Flaxman. He was named by MIT’s Technology Review one of the 35 top innovators under 35 this year, and at the ceremony for the award, he said, “We know that AIDS, TB, and malaria are big problems, for example, but we can’t tell you with precision how many malaria deaths are in adults. The range of credible answers stretches from 9 percent to 85 percent.”
That’s why one of the important things to look for when examining an estimate is the range of uncertainty. In GBD 2010, there are uncertainty bounds shown for everything, a first for a study of this scope. For example, researchers know more about levels and trends in ischemic heart disease than they do about measles. To see what some of this uncertainty looks like, take a look at this data visualization that shows some of the leading diseases worldwide and the range of possible estimates for their health impact.
Let me know if you have questions or suggestions for this series. The more we understand about the data we use for health stories, the better positioned we will be to ask the right questions.
You can reach me via email at email@example.com or on Twitter @wheisel.