Analysis of drowsy driving: exploring subpopulation risk with weighted contingency table tools

Coyle, Patrick; Chen, Chen; Dabbish, Nooreen

doi:10.1007/s00180-021-01071-w

Analysis of drowsy driving: exploring subpopulation risk with weighted contingency table tools

Original paper
Published: 24 February 2021

Volume 36, pages 1605–1620, (2021)
Cite this article

Computational Statistics Aims and scope Submit manuscript

237 Accesses
1 Citation
Explore all metrics

Abstract

The incidence of driving accidents due to human error, and drowsy driving in particular, is an important topic in the field of public health research, and might be considered preventable. Our study investigates the 606 police-reported drowsy driving accidents in 2013 as recorded in the NASS General Estimates System from the NHTSA. This study seeks to examine how the prevalence of drowsy driving in accidents differs between subpopulations and how this prevalence changes depending on the time of day. We explore these interactions using recent developments in survey-weighted ROC analysis. By doing so, we hope to offer employers and government agencies insight into what can be done to reduce the rate of injuries and fatalities related to drowsy driving.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Traffic Violations in Guangdong Province of China: Speeding and Drunk Driving

Road Traffic Accidents in Morocco: Exploratory Analysis of Driver, Vehicle, and Pedestrian Factors

Article 17 December 2022

Self-Reported Speeding Among New York City Adult Drivers, 2015–2016

Article 20 September 2020

Notes

There is currently no documented variance estimation for a survey-weighted Breslow–Day test. Instead, we test using the raw table counts instead of the sum of weights.
When creating a svyglm object with GES2013.rda or its subsets, you must set options(survey.lonely.psu=“adjust”).

References

Arnold LS, Tefft BC (2015) Prevalence of self-reported drowsy driving. AAA Foundation for Traffic Safety, United States. https://aaafoundation.org/wp-content/uploads/2017/12/PrevalenceOfSelfReportedDrowsyDrivingReport.pdf. Accessed 1 July 2016
Cryer PC, Westrup S, Cook AC, Ashwell V, Bridger P, Clarke C (2001) Investigation of bias after data linkage of hospital admissions data to police road traffic crash reports. Injury Prev 7(3):234–241
Article Google Scholar
Grant RJ, Gregor MA, Maio RF, Huang SS (1998) The accuracy of medical records and police reports in determining motor vehicle crash characteristics. Prehosp Emerg Care 2(1):23–28
Article Google Scholar
Greenberg L (1996) Police accident report (PAR) quality assessment project. National Technical Information Service. https://ntrl.ntis.gov/NTRL/dashboard/searchResults/titleDetail/PB97135313.xhtml. Accessed 1 July 2016
Knipling RR, Wang JS (1995) Revised estimates of the US drowsy driver crash problem size based on general estimates system case reviews. Annual proceedings of the Association for the Advancement of Automotive Medicine, vol 39. Association for the Advancement of Automotive Medicine
Lumley T (2014) survey: analysis of complex survey samples. R package version 3.30-3
National Highway Traffic Safety Administration (2011) Traffic safety facts drowsy driving. DOT HS 811:449
Google Scholar
R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/. Accessed 1 July 2016
Royal D (2002) National Survey of Distracted and Drowsy Driving Attitudes and Behavior: 2002. NHTSA Technical Report, DOT HS 809:566 https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/hs809566v1.pdf. Accessed 1 July 2016
Shelton T (1991) National Accident Sampling System General Estimates System Technical Note 1988 to 1990. NHTSA Technical Report. DOT HS 807:796
Stutts JC, Wilkins JW, Vaughn BV (1999) Why do people have drowsy driving crashes? Input from drivers who just did. AAA Foundation for Traffic Safety. Washington (DC) 202(638):5944
Watling CN, Watling H (2015) Sleepy driving and drink driving: attitudes, behaviours, and perceived legitimacy of enforcement of younger and older drivers. In: Australasian Road Safety Conference
Yao W (2013) Estimation of ROC Curve with Complex Survey Data. UMI Dissertation Publishing, ProQuest LLC, UMI 3550591
Yao W, Li Z, Graubard B (2015) Estimation of ROC curve with complex survey data. Stat Med 34(8):1293–1303
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank our anonymous reviewers, whose suggestions helped greatly to clarify this manuscript. We thank Richard M. Heiberger for his diligent supervision of this research, and Anastasia Vishnyakova for her support and feedback.

Author information

Authors and Affiliations

Department of Statistical Science, Temple University, 1801 Liacouras Walk, Philadelphia, PA, 19122, USA
Patrick Coyle, Chen Chen & Nooreen Dabbish

Authors

Patrick Coyle
View author publications
You can also search for this author in PubMed Google Scholar
Chen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Nooreen Dabbish
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Coyle.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 R package for analyzeGES routine

An R package (analyzeGES) was developed to reproduce the research described in this article and to conduct further research of a similar flavor. This package is hosted on Github and can be downloaded using

devtools::install_github(’PatrickCoyle/analyzeGES’)

The analyzeGES package contains the 2013 and 2014 datasets used in this article, but it can also be utilized to import and examine data from other years. The structure of the package is as follows:

R functions
- mergeGivenYear(): Allows the user to specify a list of GES data file names from a desired year’s directory, then reads and merges the files from the .sas7bdat format in which the NHTSA stores them to a single data frame.
  - dir: the directory in which the .sas7bdat files are stored.
  - desired.files: A character vector naming the .sas7bdat files in the directory to be merged into a single data frame.
- svy_mosaic_splom(): Produces an upper triangular matrix of two-way weighted mosaic plots for an input vector of factor variables. A principled feature of this function is that it uses sums of the weights of observations,instead of counts of the observations, in order to reflect a complex survey design. Employs vcd::mosaic() to create the pairwise mosaic plots. A mosaic is colored red and blue if the pair of variables is significantly correlated based on a single-predictor survey::svyglm model. Employs
  
  grid::grid.grabExpr() and gridExtra::grid.arrange() to organize the pairwise mosaics into a matrix. This is an alternative to the scatterplot matrix for situations in which all variables are factors. This is useful in cases where other survey-weighted discrete data plots, such as faceted bar charts or jittered scatterplots, may not satisfactorily display the bivariate distributions.
  - data: A data frame.
  - factors: A vector of character strings corresponding to factor-type variables from the data frame.
  - weight: A character string indicating the name of the weight variable in the data frame.
  - svydesignObj: A survey::svydesign object, used to create bivariate svyglm models and color the corresponding mosaics red/blue if there is a statistically significant correlation between the two variables (at level \(\alpha = .05\)).
  - mar_par: A numeric vector of length 4 specifying the margins of each mosaic plot in the matrix.
- predROC(): Predicts the response for a test set based on the svyglm object created from the test set and returns a data frame of matched (false positive rate, true positive rate) pairs for the survey-weighted ROC curves for these predictions using WeightedROC::WeightedROC.
  - glm.obj: A svyglm object whose response is a numeric binary variable taking the values 0, 1, or NA.^{Footnote 2}
  - newData: A data frame that contains observation weights and the same predictors and binary response used to create glm.obj. This data is used as the test set for ROC analysis.
- model_and_convert(): Creates a list of svyglm models, applies
  
  analyzeGES::predROC() to each item in the list, and plots the ROC curves using ggplot2::ggplot().
  - response: A character string identifying the response variable.
  - predictor_list: A list whose elements are vectors of character strings specifying the desired predictor variables. A svyglm model is created for each element of the list.
  - svydesignObj: A svydesign object, used to create the svyglm models.
  - newData: A test set passed to analyzeGES::predROC() for analysis of predictive performance of the models.
  - plot: A boolean indicating whether to plot the ROC curves for each model.
- runExample(): Launches a Shiny application that allows you to choose a response, set of predictors and desired plot type (mosaic matrix, GGPairs or ROC curves).
Datasets
- GES2013.drivers.rda: The data used for exploratory data analysis and modeling.
- GES2013.drivers.design.rda: The svydesign object used for creating svyglm models.
- GES2014.drivers.rda: The data used for prediction.
Data-cleaning and variable-transformation script.
- A number of variables contained in the datasets and used for analysis are the transformed variables listed in this script.

1.2 Shiny app

The most user-friendly feature of the package is the Shiny app, GES_plotter(), which can be launched locally from the installed package or via its web-hosted location at

https://patrickcoyle.shinyapps.io/GES_plotter/

The app can be used with the following steps:

Navigate to the app using the URL above.
Select a plotting method from the Plot Type drop-down menu. Supported plots are:
- Mosaic Matrix: an exploratory plot that takes a vector of categorical variables and produces a matrix of pairwise weighted mosaic plots using
  
  analyzeGES::svy_mosaic_splom.
- Scatterplot Matrix: an exploratory plot using the GGally package that adapts to any pair of data types.
- ROC: A diagnostic plot that visualizes the predictive performance (for 2014 data) of a svyglm model based on a specified binary response and a vector of predictors (for 2013 data).
Select a binary response from the Single Response drop-down menu.
Select a vector of predictors from the Set of Predictors drop-down menu.
Press Create Plot.

The Single Response and Set of Predictors options are only treated as response/predictors for the ROC option; for the other two options, the response is simply appended to the predictors for pairwise exploratory analysis (Figs. 5, 6, 7).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Coyle, P., Chen, C. & Dabbish, N. Analysis of drowsy driving: exploring subpopulation risk with weighted contingency table tools. Comput Stat 36, 1605–1620 (2021). https://doi.org/10.1007/s00180-021-01071-w

Download citation

Received: 16 April 2017
Accepted: 11 January 2021
Published: 24 February 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s00180-021-01071-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of drowsy driving: exploring subpopulation risk with weighted contingency table tools

Abstract

Access this article

Similar content being viewed by others

Traffic Violations in Guangdong Province of China: Speeding and Drunk Driving

Road Traffic Accidents in Morocco: Exploratory Analysis of Driver, Vehicle, and Pedestrian Factors

Self-Reported Speeding Among New York City Adult Drivers, 2015–2016

Notes

References

Acknowledgements