The Power of Small Data: Lessons learned from a number-crunching career

Author(s)
Published on
December 4, 2015

I experienced a crushing failure as an investigative reporter that I hope to never repeat and that I hope none of you ever have to experience.

I was at the Los Angeles Times, and I was trying to investigate the family court system. That broadly stated goal was my first mistake. And I came to learn much later – too much later – that I had made many other mistakes along the way.

I was fortunate enough to be invited to deliver the keynote address at the 2015 California Health Data Fellowship at the University of Southern California’s Annenberg School for Journalism this week. I shared some of the lessons I’ve learned as I moved from a full-time health reporter into the health data research sphere, and I will write about those over a series of posts. Some of those lessons grew directly out of that big failure I mentioned.

When I was in the middle of my ultimately unsuccessful family court investigation, I thought I had a winner. I mean that in two ways. I thought that I had an investigation on my hands that would lead to big changes in the way family court cases are managed, resources allocated, and justice delivered. I thought, too, that I may finally move up from “two-time Pulitzer finalist” and actually join the leagues of some of my colleagues as a Pulitzer winner. (For those of you who say you never think about winning prizes, I commend you on your humility.)

I had done some things right. I had identified some key sources both inside the court system and in advocacy groups. I had spent time with families whose lives were devastated by overly protracted court cases, massive court-related expenses, and seemingly arbitrary decisions by overwhelmed judges, attorneys, and clerks.

But I really wanted data. I thought that what I wanted was big data. People were beginning to talk about big data at the time (and the conversation has only become louder since).

All my previous investigations had been data-driven in some way. Everything that I was most proud of went back to a spreadsheet or a database with key information: names, dates, costs, and the like.

With the family court project, I thought at the time I was doing the right thing by attempting to talk my way into getting access to a dataset that would open the doors to a byzantine structure that – even after months of research – I had only begun to understand. I wanted comprehensive data that would tell me when court cases started, when they finished, and all the steps in between. I wanted an unimpeachable dataset that would tell me whether the stories that I was hearing outside the courtrooms – from parents who had lost their children, from abuse victims who had been forced to return to their abusers, from families who had been financially ruined as a result of court costs – were rooted in a generalizable reality. I wanted a dataset that would convince my editors, too, that this was a story worthy of continued investment and newsroom resources.

Months and months later, I still had no master dataset and had an only slightly more sophisticated understanding of the system and what I was ultimately going to write about it. In short, the project never got off the ground, and a lot of time and effort was wasted. Initially, I just moved on to the next project. All reporters have stories that remain in their notebooks, and I treated this one the same way I would have treated a school board meeting that was just not exciting enough to make the next morning’s paper.

But I was dead wrong.

That project was different. And it wasn’t until several years after I had left the Los Angeles Times to take a job at the Institute for Health Metrics and Evaluation (IHME) at the University of Washington that I realized how wrong I had been and how I had failed as a reporter and failed those families.

In fact, I don’t think I had fully reconciled how differently that project could have gone until I sat down a few weeks ago to start preparing for my talk at USC. Here are some things I realized I was doing wrong, the same mistakes so many other reporters make in the natural course of reporting stories of all shapes and sizes:

1. Thinking too big. This runs counter to so many motivational speeches at reporting conferences. Think big! Be ambitious! I should have narrowed my scope considerably. I probably should have narrowed it to one family court in one city. I have seen what big data looks like working at IHME, and I now realize that I never came close to working with big data as a reporter.

2. Asking too many questions. I nearly always tell people, whether I’m talking about using health data for decision-making or for reporting, to come up with a central question and stick to it. I didn’t do that.

3. Asking questions that were too vague. Had I narrowed my focus, I also would have organized my entire line of questioning – both for my sources and for what little data I did have. Instead, I was casting about for anything that might be of interest, trying to solve the problems of every type of family court disaster I had encountered instead of zeroing in on area.

4. Looking for data gold mines that didn’t exist. Having spent time for IHME working to identify and negotiate access to large datasets, I realize now that there is a wealth of data around us all the time that mostly goes unnoticed. But it’s rarely packaged neatly and sitting on some server in one government office for you to download.

5. Praying for perfect data instead of working with messy data. The best, data-savvy reporters have known this for decades, but it merits repeating. If you want to use data to answer interesting questions for your audiences, you need to understand that data can disappoint, can deceive, and can undo a lot of hard work. You likely will want to bring in some expert guidance, whether it be someone from your own newsroom, a colleague from another outlet, or a professional data cruncher like the ones we have at IHME.

6. Taking too long to figure out I was failing.

In the hope of preventing others from stumbling down that same path, I will expand on some of these lessons and offer more tips for how to avoid big data pitfalls and make small data work for you in future posts.

[Photo by Michael Gray via Flickr.]