A Researcher’s Keys to Administrative Data

January 28, 2014

The CIRP@CHOP Teen Driver Safety Research team uses several methodological approaches in our research, including: evidence-based intervention design and evaluation, driving simulation, on road driving assessment, and analysis of existing data sources. As the CIRP@CHOP Director of Epidemiology and Biostatistics, I have been working to find ways to improve the methods with which researchers analyze existing data sources to boost teen driver safety. What I have learned, however, can be applied to many research areas beyond teen driver safety.

My CIRP@CHOP colleagues and I recently published a study in Accident Analysis and Prevention that questions the traditional method of using citation data to determine crash responsibility. Although moving violation citations are frequently used to assign who is at fault, this criterion may not be accurate for various reasons. We examined the statistical implications of using moving violations to determine crash responsibility by comparing it with a method based on the presence of crash-contributing driver actions (e.g., inattention, failure to yield at traffic control device) and were able to show that the traditional methods do not perform as well in identifying at-fault drivers. Our recent work has established the quality of crash-contributing driver action—a particular important data element in our ongoing studies—and our findings support its use in applied crash studies.

When it comes to working with administrative data, which is not created for research purposes and often complex, it may be helpful for researchers to think like investigative reporters. Although it may be hard to resist the urge to jump into analysis—especially if assured the data were cleaned—it’s critical to take the time to intimately understand how it was collected and to determine its quality and validity. This usually involves some tenacity. Finding the person most knowledgeable about the data may be a difficult task. In my prior position as the primary contact for vital statistics data requests at the New York City Department of Health and Mental Hygiene, it struck me how few researchers requesting data would ask about its quality or to speak to my colleague who worked with the data for decades.

The time spent investigating the data before using it can be well worth the effort. When we used the NJ Motor Vehicle Commission and Department of Transportation databases for our study, we were able to ask many questions and gain insight into how the data could best be used in a systematic way to help us better understand the public health issue and epidemiology of teen driver crashes.

When we collect our own data, researchers generally have a better understanding of the extent of its validity. When we use administrative data, we must remember to use it wisely.