Abstract
Emerging big data resources and practices provide opportunities to improve transportation safety planning and outcomes. However, researchers and practitioners recognise that big data from mobile phones, social media, and on-board vehicle systems include biases in representation and accuracy, related to transportation safety statistics. This study examines both the sources of bias and approaches to mitigate them through a review of published studies and interviews with experts. Coding of qualitative data enabled topical comparisons and reliability metrics. Results identify four categories of bias and mitigation approaches that concern transportation researchers and practitioners: sampling, measurement, demographics, and aggregation. This structure for understanding and working with bias in big data supports research with practical approaches for rapidly evolving transportation data sources.
Similar content being viewed by others
References
Abdel-Aty M, Lee J, Siddiqui C, Choi K (2013) Geographical unit based analysis in the context of transportation safety planning. Transp Res Part A Policy Pract 49:62–75. https://doi.org/10.1016/j.tra.2013.01.030
Adams WC (2015) Conducting semi-structured interviews. In: Newcomer KE, Hatry HP, Wholey JS (eds) Handbook of practical program evaluation. Wiley, Hoboken, NJ, pp 492–505. https://doi.org/10.1002/9781119171386.ch19
Badu-Marfo G, Farooq B, Patterson Z (2019) A perspective on the challenges and opportunities for privacy-aware big transportation data. J Big Data Anal Transp 1:1–23. https://doi.org/10.1007/s42421-019-00001-z
Bao J, Liu P, Yu H, Xu C (2017) Incorporating twitter-based human activity information in spatial analysis of crashes in urban areas. Accid Anal Prev 106:358–369. https://doi.org/10.1016/j.aap.2017.06.012
Batty M (2016) Big data and the city. Built Environ 42:321–337. https://doi.org/10.2148/benv.42.3.321
Batty M (2018) Inventing future cities. MIT Press, Cambridge
Beecham R, Wood J (2013) Exploring gendered cycling behaviours within a large-scale behavioural data-set. Transp Plan Technol 37:83–97. https://doi.org/10.1080/03081060.2013.844903
Bergman C, Oksanen J (2016a) Estimating the biasing effect of behavioural patterns on mobile fitness app data by density-based clustering. In: Sarjakoski T, Santos MY, Sarjakoski LT (eds) Geospatial data in a changing world. Springer, Cham, pp 199–218
Bergman C, Oksanen J (2016b) Conflation of OpenStreetMap and mobile sports tracking data for automatic bicycle routing. Trans GIS 20:848–868. https://doi.org/10.1111/tgis.12192
Bonnel P, Bayart C, Smith B (2015) ScienceDirect workshop synthesis: comparing and combining survey modes. Transp Res Procedia 11:108–117. https://doi.org/10.1016/j.trpro.2015.12.010
Boss D, Nelson T, Winters M, Ferster CJ (2018) Using crowdsourced data to monitor change in spatial patterns of bicycle ridership. J Transp Health. https://doi.org/10.1016/j.jth.2018.02.008
Brennan TM, Gurriell RA, Bechtel AJ, Venigalla MM (2019) Visualizing and evaluating interdependent regional traffic congestion and system resiliency, a case study using big data from probe vehicles. J Big Data Anal Transp 1:25–36. https://doi.org/10.1007/s42421-019-00002-y
Buehler R (2018) Can public transportation compete with automated and connected cars? J Public Transp 21:7–18. https://doi.org/10.5038/2375-0901.21.1.2
Chen C, Ma J, Susilo Y et al (2016) The promises of big data and small data for travel behavior (aka human mobility) analysis. Transp Res Part C Emerg Technol 68:285–299. https://doi.org/10.1016/j.trc.2016.04.005
Chen X, Zahiri M, Zhang S (2017) Understanding ridesplitting behavior of on-demand ride services: an ensemble learning approach. Transp Res Part C 76:51–70. https://doi.org/10.1016/j.trc.2016.12.018
Crawford K (2013) The Hidden Biases in Big Data. In: Harv. Bus. Rev. https://hbr.org/2013/04/the-hidden-biases-in-big-data. Accessed 4 Aug 2017
Crayton TJ, Meier BM (2017) Autonomous vehicles: develo** a public health research agenda to frame the future of transportation policy. J Transp Health 6:245–252. https://doi.org/10.1016/j.jth.2017.04.004
Desouza KC, Smith KL (2016) PAS report 585 big data and planning. American Planning Association, Chicago
Diao M, Zhu Y, Ferreira J, Ratti C (2016) Inferring individual daily activities from mobile phone traces: a Boston example. Environ Plan B Plan Des 43:920–940. https://doi.org/10.1177/0265813515600896
Efthymiou A, Barmpounakis EN, Efthymiou D, Vlahogianni EI (2019) Transportation mode detection from low-power smartphone sensors using tree-based ensembles. J Big Data Anal Transp 1:57–69. https://doi.org/10.1007/s42421-019-00004-w
Erhardt GD, Dennett A (2017) Understanding the role and relevance of the census in a changing transportation data landscape. In: Lawson CT (ed) Transportation research board conference on applying census data for transportation. Transportation Research Board, Kansas City, Missouri
Evans-Cowley JS, Griffin GP (2012) Microparticipation with social media for community engagement in transportation planning. Transp Res Rec J Transp Res Board 2307:90–98. https://doi.org/10.3141/2307-10
Fagnant DJ, Kockelman KM (2014) The travel and environmental implications of shared autonomous vehicles, using agent-based model scenarios. Transp Res Part C Emerg Technol 40:1–13. https://doi.org/10.1016/j.trc.2013.12.001
García-Albertos P, Picornell M, Salas-Olmedo MH, Gutiérrez J (2018) Exploring the potential of mobile phone records and online route planners for dynamic accessibility analysis. Transp Res Part A Policy Pract. https://doi.org/10.1016/j.tra.2018.02.008
Garmin (2018) Garmin Connect. https://connect.garmin.com/en-US/. Accessed 31 May 2018
Griffin GP, Jiao J (2015a) Crowdsourcing bicycle volumes: exploring the role of volunteered geographic information and established monitoring methods. URISA J 27:57–66
Griffin GP, Jiao J (2015b) Where does bicycling for health happen? Analysing volunteered geographic information through place and plexus. J Transp Health 2:238–247. https://doi.org/10.1016/j.jth.2014.12.001
Griffin GP, Nordback K, Götschi T et al (2014) Monitoring bicyclist and pedestrian travel and behavior, transportation research circular E-C183. Transportation Research Board, Washington, DC
Griffin GP, Mulhall M, Simek C (2018) [dataset] Sources and mitigation of bias in big data for transportation safety. In: Virginia Tech Transp. Inst. Dataverse. https://doi.org/10.15787/VTT1/KRTX66. Accessed 6 Jan 2020
Gschwender A, Munizaga M, Simonetti C (2016) Using smart card and GPS data for policy and planning: the case of Transantiago. Res Transp Econ 59:242–249. https://doi.org/10.1016/j.retrec.2016.05.004
Guerra E, Morris EA (2018) Cities, automation, and the self-parking elephant in the room. Plan Theory Pract 9357:1–7. https://doi.org/10.1080/14649357.2017.1416776
Guest G, MacQueen K, Namey E (2014) Applied thematic analysis. SAGE Publications Inc, Thousand Oaks
Gürbüz F, Turna F (2018) Rule extraction for tram faults via data mining for safe transportation. Transp Res Part A Policy Pract 116:568–579. https://doi.org/10.1016/j.tra.2018.07.011
Harris R, O’Sullivan D, Gahegan M et al (2017) More bark than bytes? Reflections on 21 + years of geocomputation. Environ Plan B Urban Anal City Sci 44:598–617. https://doi.org/10.1177/2399808317710132
Henke N, Bughin J, Chui M et al (2016) The age of analytics: competing in a data-driven world. McKinsey Global Institute, London
Hipp JA, Adlakha D, Eyler AA et al (2017) Learning from outdoor webcams: surveillance of physical activity across environments. In: Hakuriah P, Tilahun N, Zellner M (eds) Seeing cities through big data. Springer Geography, Cham, pp 471–490
Hong A, Kim B, Widener M (2019) Noise and the city: leveraging crowdsourced big data to examine the spatio-temporal relationship between urban development and noise annoyance. Environ Plan B Urban Anal City Sci. https://doi.org/10.1177/2399808318821112
Johnson TP, Smith TW (2017) Big data and survey research: supplement or substitute? In: Hakuriah P, Tilahun N, Zellner M (eds) Seeing cities through big data. Springer Geography, Cham, pp 113–125
Kieu LM, Bhaskar A, Chung E (2015) Passenger segmentation using smart card data. IEEE Trans Intell Transp Syst 16:1537–1548. https://doi.org/10.1109/TITS.2014.2368998
Kitchel D, Riordan B (2014) Strava Metro Product Documentation. Strava, Inc., Hanover, NH
Krippendorff K, Craggs R (2016) The reliability of multi-valued coding of data. Commun Methods Meas 10:181–198. https://doi.org/10.1080/19312458.2016.1228863
Krishnamurthy R, Smith KL, Desouza KC (2017) Urban informatics: critical data and technology considerations. In: Hakuriah P, Tilahun N, Zellner M (eds) Seeing cities through big data. Springer Geography, Cham, pp 163–188
Kwan M-P (2012) The uncertain geographic context problem. Ann Assoc Am Geogr 102:958–968. https://doi.org/10.1080/00045608.2012.687349
Legacy C, Ashmore D, Scheurer J et al (2019) Planning the driverless city. Transp Rev 39:84–102. https://doi.org/10.1080/01441647.2018.1466835
Mcardle G, Kitchin R (2016) Improving the veracity of open and real-time urban data. Built Environ 42:457–473. https://doi.org/10.2148/benv.42.3.457
Mehmood R, Meriton R, Graham G et al (2017) Exploring the influence of big data on city transport operations: a Markovian approach. Int J Oper Prod Manag 37:75–104. https://doi.org/10.1108/IJOPM-03-2015-0179
Mondschein A (2015) Five-star transportation: using online activity reviews to examine mode choice to non-work destinations. Transportation (Amst) 42:707–722. https://doi.org/10.1007/s11116-015-9600-7
Murphy J, Link MW, Childs JH et al (2014) Social Media in Public Opinion Research: report of the AAPOR task force on emerging technologies in public opinion research. American Association of Public Opinion Research, Deerfield, IL
O’Connor H, Madge C, Shaw R, Wellens J (2008) Internet-based Interviewing. In: Fielding N, Lee RM, Blank G (eds) The SAGE handbook of online research methods. SAGE Publications Ltd, London, pp 271–289
Ose SO (2016) Using excel and word to structure qualitative data. J Appl Soc Sci 10:147–162. https://doi.org/10.1177/1936724416664948
Peng P, Yang Y, Lu F et al (2018) Modelling the competitiveness of the ports along the Maritime Silk Road with big data. Transp Res Part A Policy Pract 118:852–867. https://doi.org/10.1016/j.tra.2018.10.041
Piwek L, Joinson A, Morvan J (2015) The use of self-monitoring solutions amongst cyclists: an online survey and empirical study. Transp Res Part A Policy Pract 77:126–136. https://doi.org/10.1016/j.tra.2015.04.010
Schweitzer LA (2014) Planning and social media: a case study of public transit and stigma on twitter. J Am Plan Assoc 80:218–238. https://doi.org/10.1080/01944363.2014.980439
Schweitzer LA, Afzalan N (2017) Four reasons why AICP needs an open data ethic. J Am Plan Assoc 83:161–167. https://doi.org/10.1080/01944363.2017.1290495
Sener IN, Zmud J, Simek C (2018) Examining future automated vehicle usage a focus on the role of ride hailing. Texas A&M Transportation Institute, Austin, TX
Shearmur R (2015) Dazzled by data: big data, the census and urban geography. Urban Geogr. https://doi.org/10.1080/02723638.2015.1050922
Smith WR (2017) Communication, sportsmanship, and negotiating ethical conduct on the digital playing field. Commun Sport 5:160–185. https://doi.org/10.1177/2167479515600199
Stenneth L, Wolfson O, Yu PS, Xu B (2011) Transportation mode detection using mobile phones and GIS information. In: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems—GIS’11. ACM Press, New York
Tasse D, Hong JI (2017) Using user-generated content to understand cities. In: Hakuriah P, Tilahun N, Zellner M (eds) Seeing cities through big data. Springer Geography, Cham, pp 49–64
Taylor L (2016) No place to hide? The ethics and analytics of tracking mobility using mobile phone data. Environ Plan D Soc Space 34:319–336. https://doi.org/10.1177/0263775815608851
Teddlie C, Tashakkori A (2010) Overview of contemporary issues in mixed methods research. In: SAGE handbook of mixed methods in social & behavioral research. SAGE Publications, Inc., Thousand Oaks, CA, pp 1–42
Toole JL, Colak S, Sturt B et al (2015) The path most traveled: travel demand estimation using big data resources. Transp Res Part C Emerg Technol 58:162–177. https://doi.org/10.1016/j.trc.2015.04.022
Vij A, Shankari K (2015) When is big data big enough? Implications of using GPS-based surveys for travel demand analysis. Transp Res Part C 56:446–462. https://doi.org/10.1016/j.trc.2015.04.025
Wagh A, Li X, Sudhaakar R et al (2013) Data fusion with flexible message composition in driver-in-the-loop vehicular CPS. Ad Hoc Netw 11:2083–2095. https://doi.org/10.1016/j.adhoc.2012.02.012
Weidemann CD, Swift JN, Kemp KK (2018) Geosocial footprints and geoprivacy concerns. In: Thatcher J, Eckert J, Shears A (eds) Thinking big data in geography: new regimes, new research. University of Nebraska Press, Lincoln
**e K, Yang D, Ozbay K, Yang H (2019) Use of real-world connected vehicle data in identifying high-risk locations based on a new surrogate safety measure. Accid Anal Prev 125:311–319. https://doi.org/10.1016/j.aap.2018.07.002
Yin L, Cheng Q, Shao Z et al (2017) ‘Big Data’: pedestrian volume using google street view images. In: Hakuriah P, Tilahun N, Zellner M (eds) Seeing cities through big data. Springer Geography, Cham, pp 461–469
Zhang Z, He Q (2019) Social media in transportation research and promising applications. In: Ukkusuri SV, Chao Y (eds) Transportation analytics in the era of big data. Springer, Cham, pp 23–45
Zhao M, Venkatanarayana R, Fontaine MD (2017) Development of a framework for VDOT big data analytics technical assistance final report. Virginia Transportation Research Council, Charlottesville, Virginia
Zhao J, Wang J, **ng Z et al (2018) Weather and cycling: mining big data to have an in-depth understanding of the association of weather variability with cycling on an off-road trail and an on-road bike lane. Transp Res Part A Policy Pract 111:119–135. https://doi.org/10.1016/j.tra.2018.03.001
Zhou X, Yeh AGO, Li W, Yue Y (2018) A commuting spectrum analysis of the jobs-housing balance and self-containment of employment with mobile phone location big data. Environ Plan B Urban Anal City Sci 45:434–451. https://doi.org/10.1177/2399808317707967
Zhou X, Chen Z, Yeh AGO, Yue Y (2019) Workplace segregation of rural migrants in urban China: a case study of Shenzhen using cellphone big data. Environ Plan B Urban Anal City Sci. https://doi.org/10.1177/2399808319846903
Zmud J, Williams T, Outwater M et al (2018) Updating regional transportation planning and modeling tools to address impacts of connected and automated vehicles, vol 2. Guidance Transportation Research Board, Washington, DC
Acknowledgements
The authors appreciate interview coding support by Boya Dai with the Texas A&M Transportation Institute, and insightful comments from four reviewers. This project was supported by the Safety through Disruption (Safe-D) National University Transportation Center, a grant from the US Department of Transportation—Office of the Assistant Secretary for Research and Technology, University Transportation Centers Program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Griffin, G.P., Mulhall, M., Simek, C. et al. Mitigating Bias in Big Data for Transportation. J. Big Data Anal. Transp. 2, 49–59 (2020). https://doi.org/10.1007/s42421-020-00013-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42421-020-00013-0