One of the pleasures of developing custom software through my company Data Bakery is the regular opportunity to learn about new and interesting topics. In the case of our client StreetCred, that topic turned out to be demographics and how they were handled in the 2010 U.S. Census. Demographics are an important topic for StreetCred because their software helps law enforcement organizations (LEOs) understand how their officers are interacting with the communities they serve.
The way race and ethnicity are handled when dealing with demographics tends to be fuzzy and happenstance. However, in the 2010 Census, the U.S. Census Bureau decided to be much more explicit: they treated Hispanic as an ethnicity separate from race (White, Black, American Indian and other races). This decision has resulted in enough confusion that the U.S. Census Bureau issued a 23 page document titled Overview of Race and Hispanic Origin: 2010 in March of 2011 that attempts to explain the situation. Briefly summarized: Hispanic is considered an ethnicity and not a race. In other words, Hispanics are in the majority considered racially White.
As a result, the question of whether or not a respondent was of "Hispanic origin" was asked independently and in addition to the respondents race. This may seem like just an interesting definition of little consequence, but the way this was handled has a direct impact on consumers of Census data who are more interested in answering demographic questions where ethnicity and race need to be considered together.
A Maze of APIs and Variables
When you first begin exploring the U.S. Census data, it comes off as an arcane and maze like combination of datasets and APIs. That tends to lead people to the US CitySDK which was created by the U.S. Census Bureau to make it easier to access U.S. Census data. Unfortunately, the CitySDK is designed to run in a web browser which limits how it can be used. However, it does demonstrate how and which U.S. Census APIs to use for various types of queries. In particular, demographic queries in CitySDK use the American Community Survey 5 (ACS5) dataset.
While the ACS and 2010 U.S. Census are two different surveys/programs, the data surveyed and reported in the ACS is handled the same way as was decided for the 2010 U.S. Census – Hispanic is an ethnicity and not a race.
As you dig deeper into the ACS5 variable set and experiment with it, you begin to get a sense of how the data is structured. Consider the following variables:
|B02001_003E||Black or African American alone|
|B02001_004E||American Indian and Alaska Native alone|
|B02001_006E||Native Hawaiian and Other Pacific Islander alone|
|B02001_007E||Some other race alone|
|B02001_008E||Two or more races|
Sounds pretty good so far, right? If you pull the data for Fort Worth, here's what you get:
|Black or African American alone||147,471|
|American Indian and Alaska Native alone||4,621|
|Native Hawaiian and Other Pacific Islander Alone||1,104|
|Som other race alone||65,096|
|Two or more races||22,403|
Everything seems good and the math adds up. However, what if you want to know what proportion of your population is Hispanic? You can see some of the confusion on this if you look at the Wikipedia entry for Demographics of Fort Worth.
If you look further into the ACS5 variables, you find:
|B03001_003E||Hispanic or Latino|
If you add this to the ACS5 query, you get 266,472 for Fort Worth. Nice to know, but what do you do with this number? You can't add it to the counts above because that would be over-counting people. The only way to combine this ethnicity count with the racial counts would be to reduce the counts above to just the people who answered a particular race but also said "no" to Hispanic origin. But it is impossible to determine how to do this from just the numbers above.
The Lesser Traveled Path
Luckily enough, if you dig further into the ACS5 variable set, you find these:
|B03002_003E||Not Hispanic or Latino – White Alone|
|B03002_004E||Not Hispanic or Latino – Black or African American alone|
|B03002_005E||Not Hispanic or Latino – American Indian and Alaska Native alone|
|B03002_006E||Not Hispanic or Latino – Asian alone|
|B03002_007E||Not Hispanic or Latino – Native Hawaiian and Other Pacific Islander alone|
|B03002_008E||Not Hispanic or Latino – Some other race alone|
|B03002_009E||Not Hispanic or Latino – Two or more races|
With these variables, you can now query the following data:
|Hispanic or Latino||266,472|
|Not Hispanic – White Alone||318,732|
|Not Hispanic – Black or African American alone||145,330|
|Not Hispanic – American Indian and Alaska Native alone||2,262|
|Not Hispanic – Asian alone||28,534|
|Not Hispanic – Native Hawaiian and Other Pacific Islander Alone||966|
|Not Hispanic – Some other race alone||1,228|
|Not Hispanic – Two or more races||15,049|
And there you have it. Demographic data combining ethnicity and race using post-2010 U.S. Census data.