Anyone who has tried to get basic data from the Census Bureau understands the unique pain of navigating their website. Locating something as simple as “Median income by age, 2000-2010” requires crawling through an infinite series of pre-generated tables, figuring out which geographies the data is available for, and downloading the file–only to discover that 2,100 of the counties you asked for are mysteriously missing.
This is one of the problems that IPUMS is here to solve. The project, which is run by the University of Minnesota, allows users to download Census data for any arbitrary set of variables. Rather than get aggregate values, IPUMS returns microdata–that is, anonymous data on each person who responded to the survey. (“IPUMS” stands for “Integrated Public Use Microdata Series,” but is universally known by the acronym, pronounced “IP-ums.”)
This is great, but it presents certain difficulties when it comes to quick analysis. The standard American Community Survey is a 1 percent sample of the population, so a single data extract of even one variable is three million records.
On top of this, the data is delivered in a condensed format that is difficult to make sense of without some heavy lifting using statistical software. If that’s your game, here’s a good tutorial on importing IPUMS data into R. Otherwise, use our script to convert your IPUMS extract into a straight-forward, tab-separated file with plain-English values for each field.