Skip to main content
Log in

Analysis of drowsy driving: exploring subpopulation risk with weighted contingency table tools

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The incidence of driving accidents due to human error, and drowsy driving in particular, is an important topic in the field of public health research, and might be considered preventable. Our study investigates the 606 police-reported drowsy driving accidents in 2013 as recorded in the NASS General Estimates System from the NHTSA. This study seeks to examine how the prevalence of drowsy driving in accidents differs between subpopulations and how this prevalence changes depending on the time of day. We explore these interactions using recent developments in survey-weighted ROC analysis. By doing so, we hope to offer employers and government agencies insight into what can be done to reduce the rate of injuries and fatalities related to drowsy driving.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. There is currently no documented variance estimation for a survey-weighted Breslow–Day test. Instead, we test using the raw table counts instead of the sum of weights.

  2. When creating a svyglm object with GES2013.rda or its subsets, you must set options(survey.lonely.psu=“adjust”).

References

  • Arnold LS, Tefft BC (2015) Prevalence of self-reported drowsy driving. AAA Foundation for Traffic Safety, United States. https://aaafoundation.org/wp-content/uploads/2017/12/PrevalenceOfSelfReportedDrowsyDrivingReport.pdf. Accessed 1 July 2016

  • Cryer PC, Westrup S, Cook AC, Ashwell V, Bridger P, Clarke C (2001) Investigation of bias after data linkage of hospital admissions data to police road traffic crash reports. Injury Prev 7(3):234–241

    Article  Google Scholar 

  • Grant RJ, Gregor MA, Maio RF, Huang SS (1998) The accuracy of medical records and police reports in determining motor vehicle crash characteristics. Prehosp Emerg Care 2(1):23–28

    Article  Google Scholar 

  • Greenberg L (1996) Police accident report (PAR) quality assessment project. National Technical Information Service. https://ntrl.ntis.gov/NTRL/dashboard/searchResults/titleDetail/PB97135313.xhtml. Accessed 1 July 2016

  • Knipling RR, Wang JS (1995) Revised estimates of the US drowsy driver crash problem size based on general estimates system case reviews. Annual proceedings of the Association for the Advancement of Automotive Medicine, vol 39. Association for the Advancement of Automotive Medicine

  • Lumley T (2014) survey: analysis of complex survey samples. R package version 3.30-3

  • National Highway Traffic Safety Administration (2011) Traffic safety facts drowsy driving. DOT HS 811:449

    Google Scholar 

  • R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/. Accessed 1 July 2016

  • Royal D (2002) National Survey of Distracted and Drowsy Driving Attitudes and Behavior: 2002. NHTSA Technical Report, DOT HS 809:566 https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/hs809566v1.pdf. Accessed 1 July 2016

  • Shelton T (1991) National Accident Sampling System General Estimates System Technical Note 1988 to 1990. NHTSA Technical Report. DOT HS 807:796

  • Stutts JC, Wilkins JW, Vaughn BV (1999) Why do people have drowsy driving crashes? Input from drivers who just did. AAA Foundation for Traffic Safety. Washington (DC) 202(638):5944

  • Watling CN, Watling H (2015) Sleepy driving and drink driving: attitudes, behaviours, and perceived legitimacy of enforcement of younger and older drivers. In: Australasian Road Safety Conference

  • Yao W (2013) Estimation of ROC Curve with Complex Survey Data. UMI Dissertation Publishing, ProQuest LLC, UMI 3550591

  • Yao W, Li Z, Graubard B (2015) Estimation of ROC curve with complex survey data. Stat Med 34(8):1293–1303

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank our anonymous reviewers, whose suggestions helped greatly to clarify this manuscript. We thank Richard M. Heiberger for his diligent supervision of this research, and Anastasia Vishnyakova for her support and feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Coyle.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 R package for analyzeGES routine

An R package (analyzeGES) was developed to reproduce the research described in this article and to conduct further research of a similar flavor. This package is hosted on Github and can be downloaded using

devtools::install_github(’PatrickCoyle/analyzeGES’)

The analyzeGES package contains the 2013 and 2014 datasets used in this article, but it can also be utilized to import and examine data from other years. The structure of the package is as follows:

  • R functions

    • mergeGivenYear(): Allows the user to specify a list of GES data file names from a desired year’s directory, then reads and merges the files from the .sas7bdat format in which the NHTSA stores them to a single data frame.

      • dir: the directory in which the .sas7bdat files are stored.

      • desired.files: A character vector naming the .sas7bdat files in the directory to be merged into a single data frame.

    • svy_mosaic_splom(): Produces an upper triangular matrix of two-way weighted mosaic plots for an input vector of factor variables. A principled feature of this function is that it uses sums of the weights of observations,instead of counts of the observations, in order to reflect a complex survey design. Employs vcd::mosaic() to create the pairwise mosaic plots. A mosaic is colored red and blue if the pair of variables is significantly correlated based on a single-predictor survey::svyglm model. Employs

      grid::grid.grabExpr() and gridExtra::grid.arrange() to organize the pairwise mosaics into a matrix. This is an alternative to the scatterplot matrix for situations in which all variables are factors. This is useful in cases where other survey-weighted discrete data plots, such as faceted bar charts or jittered scatterplots, may not satisfactorily display the bivariate distributions.

      • data: A data frame.

      • factors: A vector of character strings corresponding to factor-type variables from the data frame.

      • weight: A character string indicating the name of the weight variable in the data frame.

      • svydesignObj: A survey::svydesign object, used to create bivariate svyglm models and color the corresponding mosaics red/blue if there is a statistically significant correlation between the two variables (at level \(\alpha = .05\)).

      • mar_par: A numeric vector of length 4 specifying the margins of each mosaic plot in the matrix.

    • predROC(): Predicts the response for a test set based on the svyglm object created from the test set and returns a data frame of matched (false positive rate, true positive rate) pairs for the survey-weighted ROC curves for these predictions using WeightedROC::WeightedROC.

      • glm.obj: A svyglm object whose response is a numeric binary variable taking the values 0, 1, or NA.Footnote 2

      • newData: A data frame that contains observation weights and the same predictors and binary response used to create glm.obj. This data is used as the test set for ROC analysis.

    • model_and_convert(): Creates a list of svyglm models, applies

      analyzeGES::predROC() to each item in the list, and plots the ROC curves using ggplot2::ggplot().

      • response: A character string identifying the response variable.

      • predictor_list: A list whose elements are vectors of character strings specifying the desired predictor variables. A svyglm model is created for each element of the list.

      • svydesignObj: A svydesign object, used to create the svyglm models.

      • newData: A test set passed to analyzeGES::predROC() for analysis of predictive performance of the models.

      • plot: A boolean indicating whether to plot the ROC curves for each model.

    • runExample(): Launches a Shiny application that allows you to choose a response, set of predictors and desired plot type (mosaic matrix, GGPairs or ROC curves).

  • Datasets

    • GES2013.drivers.rda: The data used for exploratory data analysis and modeling.

    • GES2013.drivers.design.rda: The svydesign object used for creating svyglm models.

    • GES2014.drivers.rda: The data used for prediction.

  • Data-cleaning and variable-transformation script.

    • A number of variables contained in the datasets and used for analysis are the transformed variables listed in this script.

1.2 Shiny app

The most user-friendly feature of the package is the Shiny app, GES_plotter(), which can be launched locally from the installed package or via its web-hosted location at

The app can be used with the following steps:

  • Navigate to the app using the URL above.

  • Select a plotting method from the Plot Type drop-down menu. Supported plots are:

    • Mosaic Matrix: an exploratory plot that takes a vector of categorical variables and produces a matrix of pairwise weighted mosaic plots using

      analyzeGES::svy_mosaic_splom.

    • Scatterplot Matrix: an exploratory plot using the GGally package that adapts to any pair of data types.

    • ROC: A diagnostic plot that visualizes the predictive performance (for 2014 data) of a svyglm model based on a specified binary response and a vector of predictors (for 2013 data).

  • Select a binary response from the Single Response drop-down menu.

  • Select a vector of predictors from the Set of Predictors drop-down menu.

  • Press Create Plot.

The Single Response and Set of Predictors options are only treated as response/predictors for the ROC option; for the other two options, the response is simply appended to the predictors for pairwise exploratory analysis (Figs. 5, 6, 7).

Fig. 5
figure 5

GES_plotter, a Shiny application for visualization of survey-weighted models and their ROC performance. This allows for convenient exploration of the dataset by automating the project’s model-building and visualization process with a graphical user interface

Fig. 6
figure 6

GES_plotter used to print a matrix of weighted mosaic plots for a user-input subset of variables using mosaicSplom2. A mosaic is colored red and blue if the pair of variables is significantly correlated based on a single-predictor svyglm model

Fig. 7
figure 7

GES_plotter used to print a GGally scatterplot matrix for a user-input subset of variables

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Coyle, P., Chen, C. & Dabbish, N. Analysis of drowsy driving: exploring subpopulation risk with weighted contingency table tools. Comput Stat 36, 1605–1620 (2021). https://doi.org/10.1007/s00180-021-01071-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-021-01071-w

Keywords

Navigation