Demographics and the 2010 U.S. Census

One of the pleasures of developing custom software through my company Data Bakery is the regular opportunity to learn about new and interesting topics. In the case of our client StreetCred, that topic turned out to be demographics and how they were handled in the 2010 U.S. Census. Demographics are an important topic for StreetCred because their software helps law enforcement organizations (LEOs) understand how their officers are interacting with the communities they serve.

The way race and ethnicity are handled when dealing with demographics tends to be fuzzy and happenstance. However, in the 2010 Census, the U.S. Census Bureau decided to be much more explicit: they treated Hispanic as an ethnicity separate from race (White, Black, American Indian and other races). This decision has resulted in enough confusion that the U.S. Census Bureau issued a 23 page document titled Overview of Race and Hispanic Origin: 2010 in March of 2011 that attempts to explain the situation. Briefly summarized: Hispanic is considered an ethnicity and not a race. In other words, Hispanics are in the majority considered racially White.

As a result, the question of whether or not a respondent was of "Hispanic origin" was asked independently and in addition to the respondents race. This may seem like just an interesting definition of little consequence, but the way this was handled has a direct impact on consumers of Census data who are more interested in answering demographic questions where ethnicity and race need to be considered together.

A Maze of APIs and Variables

When you first begin exploring the U.S. Census data, it comes off as an arcane and maze like combination of datasets and APIs. That tends to lead people to the US CitySDK which was created by the U.S. Census Bureau to make it easier to access U.S. Census data. Unfortunately, the CitySDK is designed to run in a web browser which limits how it can be used. However, it does demonstrate how and which U.S. Census APIs to use for various types of queries. In particular, demographic queries in CitySDK use the American Community Survey 5 (ACS5) dataset.

While the ACS and 2010 U.S. Census are two different surveys/programs, the data surveyed and reported in the ACS is handled the same way as was decided for the 2010 U.S. Census – Hispanic is an ethnicity and not a race.

As you dig deeper into the ACS5 variable set and experiment with it, you begin to get a sense of how the data is structured. Consider the following variables:

Variable Name Label
B02001_002E White Alone
B02001_003E Black or African American alone
B02001_004E American Indian and Alaska Native alone
B02001_005E Asian alone
B02001_006E Native Hawaiian and Other Pacific Islander alone
B02001_007E Some other race alone
B02001_008E Two or more races
B01003_001E Total

Sounds pretty good so far, right? If you pull the data for Fort Worth, here's what you get:

Label Count
White Alone 508,894
Black or African American alone 147,471
American Indian and Alaska Native alone 4,621
Asian alone 28,984
Native Hawaiian and Other Pacific Islander Alone 1,104
Som other race alone 65,096
Two or more races 22,403
Total 778,573

Everything seems good and the math adds up. However, what if you want to know what proportion of your population is Hispanic? You can see some of the confusion on this if you look at the Wikipedia entry for Demographics of Fort Worth.

If you look further into the ACS5 variables, you find:

Variable Name Label
B03001_003E Hispanic or Latino

If you add this to the ACS5 query, you get 266,472 for Fort Worth. Nice to know, but what do you do with this number? You can't add it to the counts above because that would be over-counting people. The only way to combine this ethnicity count with the racial counts would be to reduce the counts above to just the people who answered a particular race but also said "no" to Hispanic origin. But it is impossible to determine how to do this from just the numbers above.

The Lesser Traveled Path

Luckily enough, if you dig further into the ACS5 variable set, you find these:

Variable Name Label
B03002_003E Not Hispanic or Latino – White Alone
B03002_004E Not Hispanic or Latino – Black or African American alone
B03002_005E Not Hispanic or Latino – American Indian and Alaska Native alone
B03002_006E Not Hispanic or Latino – Asian alone
B03002_007E Not Hispanic or Latino – Native Hawaiian and Other Pacific Islander alone
B03002_008E Not Hispanic or Latino – Some other race alone
B03002_009E Not Hispanic or Latino – Two or more races

With these variables, you can now query the following data:

Label Count
Hispanic or Latino 266,472
Not Hispanic – White Alone 318,732
Not Hispanic – Black or African American alone 145,330
Not Hispanic – American Indian and Alaska Native alone 2,262
Not Hispanic – Asian alone 28,534
Not Hispanic – Native Hawaiian and Other Pacific Islander Alone 966
Not Hispanic – Some other race alone 1,228
Not Hispanic – Two or more races 15,049
Total 778,573

And there you have it. Demographic data combining ethnicity and race using post-2010 U.S. Census data.