One of the most frustrating aspects of data journalism is that, even when data is publicly available and easy to download, it is often in a format that’s nearly impossible to work with. The congressional election results that the Federal Election Commission maintains are a perfect example. The vote totals for every candidate in every race are all tossed into one large, poorly documented Excel file. Some candidates appear multiple times under different parties, which are listed on a separate sheet.
It’s easy enough for a human to look at these files and, after much squinting, figure out how the data is structured. But for any large scale analysis, we need data that is readable both by people and machines. To get the data I needed for the feature on Libertarian “spoiler” candidates, this involved writing a Node.js script to extract the data and organize it in a coherent way.
Our pain should not be your pain, so we’re releasing both the code to extract the election data and the data files themselves. You can read more about how this works on the data sheet dedicated to this code. And if you’re interested in data on what the legislators do once they get elected, there’s no better resource than the United States project on GitHub.