Research In Action

Research In Action

Reducing Disparities: Race/Ethnicity in Transportation Data
August 24, 2021

Racial and ethnic disparities and inequities have been documented across various transportation outcomes, including crash and crash fatality rates. Analyzing population-level data would enhance our understanding of the origins of racial and ethnic differences in transportation outcomes, as well as improve our ability to develop interventions and policies to reduce them. However, very few studies have examined disparities and inequities using population-level transportation data sources since they often do not contain race/ethnicity information.

To this end, my colleagues and I recently began exploring various methods of incorporating race/ethnicity information into large datasets. In an article published in Traffic Injury Prevention, we describe the utility of estimating race and ethnicity using the Bayesian Improved Surname Geocoding (BISG) algorithm. The BISG algorithm was developed by the RAND Corporation as an approach to produce accurate and reliable group- or population-level race/ethnicity estimates.

The algorithm compares each individual’s last name and residential address to the US Census surname list and the racial/ethnic composition of their US Census block group. It then produces a set of probabilities an individual belongs to each of 6 mutually exclusive racial/ethnic groups (defined by the RAND Corp): White, Hispanic, Black, Asian/Pacific Islander, Multiracial, and American Indian/Alaskan Native. We hypothesized BISG may be a feasible method to incorporate race/ethnicity into traffic data sources that do not collect this information, since these data sources often contain residential addresses and last names.

What We Did

First, we identified over 4 million drivers (ages 17-99) with a known race/ethnicity in the New Jersey Safety and Health Outcomes (NJ-SHO) Data Warehouse. Using these drivers’ last names and residential addresses, we determined the likelihood each driver belonged to each of our race/ethnicity categories using the BISG algorithm. We then compared how similar the BISG estimates were to the known race/ethnicity variables.

What We Found

We calculated BISG race/ethnicity probabilities for ~99% of drivers in our sample. Overall, we found BISG probabilities were extremely similar to known race/ethnicity values for the four largest racial/ethnic groups in our sample: White, Black, Hispanic, Asian/Pacific Islander. However, we also found BISG estimates were not similar to the known values for Multiracial and American Indian/Alaskan Native drivers, which is consistent with previous applications of BISG.

What This Means

The BISG algorithm is a promising method to incorporate race and ethnicity for the 4 largest racial/ethnic groups in population-level crash, licensing, and other transportation databases via use of collected name and address data, regardless of whether these data are linked to external sources with known race and ethnicity information. Taken with previous studies, our findings suggest applying BISG to traffic safety analyses may also reduce potential biases commonly found in data collection and analysis, ultimately promoting more effective traffic safety interventions and equitable policies.

Ethical Guidelines to Consider

While our goal is to promote research to reduce racial and ethnic disparities in transportation contexts, we believe it is important to highlight important ethical guidelines for researchers to consider before beginning this line of work:

  • Race and ethnicity are social constructs and should be used to identify populations at-risk for adverse outcomes due to other causal factors that can be the target of intervention.
  • Routinely using race/ethnicity in analyses may provide support to false beliefs that disparities are caused by race/ethnicity instead of other underlying factors (e.g., behaviors, income level, education, environment). Because of this, researchers should explicitly state why and how they included race and ethnicity variables.
  • All relevant factors (e.g., racism and discrimination, wealth, age, language, and environmental exposures) should be considered when interpreting racial and ethnic differences. In particular, researchers should adjust for socioeconomic status and social class in analyses, which are the most common sources of bias in racial and ethnic differences.