On the Need to Understand Human Behavior to Do Analytics of Behavior

Meyer, Joachim

doi:10.1007/978-3-031-39101-9_3

Joachim Meyer⁴

Part of the book series: Knowledge and Space ((KNAS,volume 19))

1329 Accesses

Abstract

Artificial Intelligence and data science are rapidly gaining importance as parts of decision support systems. As these systems improve, it becomes necessary to clarify humans’ roles in the decision-making processes. Humans may not be able to improve on the choices a good algorithm makes, they may not be able to adjust the parameters of the algorithm correctly, and their role in processes that use good algorithms may be limited. However, this does not mean human involvement in data-supported decision processes is unnecessary. A closer look at the analytical process reveals that each step entails human decisions, beginning with the data preparation through the choice of algorithms, the iterative analyses, and the display and interpretation of results. These decisions may affect the following steps in the process and may alter the resulting conclusions. Furthermore, the data for the analyses often result from recordings of human actions that do not necessarily reflect the actual recorded events. Data for certain events may often not be recorded, requiring a “big-data analysis of non-existing data.” Thus, adequate use of data-based decisions requires modeling relevant human behavior to understand the decision domains and available data to prevent possible systematic biases in the resulting decisions.

You have full access to this open access chapter, Download chapter PDF

Understanding Behaviors in Different Domains: The Role of Machine Learning Techniques and Network Science

Big Data Predictions Devoid of Theory

Decision Intelligence Analytics: Making Decisions Through Data Pattern and Segmented Analytics

Keywords

In our current “age of data”, Artificial Intelligence (AI), Machine Learning (ML), Data Science (DS), and analytics are becoming part of problem-solving and decision-making in many areas, ranging from recommendations for movies and music to medical diagnostics, the detection of cybercrime, investment decisions or the evaluation of military intelligence (e.g., McAfee & Brynjolfsson, 2012). These methods can be used because an abundance of information is collected and made available. Also, the tools for analyzing such information are becoming widely accessible, and their use has become easier with platforms such as BigML. While in the past, statisticians or data scientists were in charge of the analytics process, now anybody with some basic computing skills can conduct analyses with R or Python, using open-source tools and libraries.

These developments are the basis for new insights and understanding social and physical settings. They also alter the decision processes used by organizations and the information that is available to individuals. As such, they affect reality, its representation in digital records and the media, and the ways people interpret this reality and act in it. The dynamic interaction between the physical, digital, and social realms shapes current societies. Understanding and modeling it is a major challenge for both data science and the social sciences.

Data analytics, and the information one can gain from them, can be used in decision-making processes, in which they help to choose among possible alternatives. Algorithmic decisions can be advantageous in legal contexts, such as bail decisions (Kleinberg, Lakkaraju, Leskovec, Ludwig, & Mullainathan, 2018). In medical settings, the development of personalized evidence-based medicine for diagnostic or treatment decisions (Kent, Steyerberg, & van Klaveren, 2018) depends on analyzing electronic medical records with data science tools. AI-based analyses in medicine can indeed improve diagnostic or therapeutic decisions (Puaschunder, Mantl, & Plank, 2020). Similarly, algorithms in financial markets, implemented as algorithmic advisors or in algorithmic trading, can provide clear benefits (Tao, Su, ** the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of “big data”. Geoforum, 52, 167–179. https://doi.org/10.1016/j.geoforum.2014.01.006 " href="#ref-CR31" id="ref-link-section-d221393918e1193">2014). This was the strongest hurricane that had hit the New York City area in recorded history. There is indeed a strong correlation between Twitter activities and the strength of a storm, but there were very few Twitter activities in the areas in which the storm was the strongest. Two causes can explain the nonmonotonic relation between Twitter activities and storm strength. Both are related to the physical realm. One is that, very often, people flee an area after they receive a hurricane warning and are told to evacuate a certain area, so they will not tweet anymore from this area. A second reason may be that storms tend to topple cellular towers. So even if people remained in the area, they may not have been able to communicate, causing a decrease in communication activity in these areas.

These are examples of nonexisting data of existing events that result from a biased or partial recording of data. They are due to the physical properties of the data collection process or of the events that generate the data in physical reality. However, the selectivity of the data does not only depend on the external statistics of the physical properties of the world. It may also result from specific human actions that may create a somewhat partial view of reality. For instance, a study of credit card data in a country in which there was social unrest showed that the effect of the localized unrest (which mainly involved large demonstrations in specific locations in a metropolitan area) diminished with distance from the demonstrations, as expressed in the number of purchases and the amounts of money spent on a purchase (Dong, Meyer, Shmueli, Bozkaya, & Pentland, 2018). This effect was not the same for all parts of this society. Some groups of the population showed a greater change than others. However, when interpreting these results, we need to keep in mind that we have only partial data on the economic activities in this country during this era of unrest because we only have credit card data. People in this country also use cash, and the decrease in credit card purchases may only reveal part of the picture.

Another factor that affects the digital records of behavior that can be analyzed is the fact that some behaviors will be more easily recorded while others are less so. For instance, on social media, socially desirable and high-prestige behavior will appear more often in posts than less desirable behavior. Viewers, consequently, may feel that others are more engaged in these positively valued behaviors than they themselves (Chou & Edge, 2012). Also, the digital image of the world that may emerge from scra** social media data will present a biased view, possibly overrepresenting the behaviors people like to post about on the web. Any decisions made based on these data, for instance, concerning the public investment in different facilities for leisure activities or the development of product lines for after hours, may be biased and may be misled by people’s tendency to post about some things and not post about others.

Another example of the partial representation of the physical or social reality in data is demonstrated in Omer Miran’s master’s thesis (Miran, 2018). The study dealt with the analysis of policing activity in the UK, as expressed in the data the UK police uploaded to their website.^{Footnote 1} Making police data openly available allows the public to monitor police activities. It also provides the basis for the assessment of the risk of crime in different areas. This can, for instance, help individuals in their decisions about where to live, rent or buy an apartment, and raise their kids.

The study aimed to determine the relative frequency of different types of crimes in different parts of the UK, where each part was defined by the specific police station that oversaw an area. The analysis combined information from the “crime cases database” for the years 2010–2015, which includes reports of crime incidents and their locations. The most important one is the UK police database, in which all crime events are recorded with relatively rough geographical information. A second database is the database on police stop and search activities for the year 2014, also downloaded from the UK police site. Here, the location at which a person was stopped is also recorded. Two other databases were from the UK Office for National Statistics and included population size and the average weekly for different locations.

The analysis focused on two different types of crime—burglary and drug-related crime. In a burglary, one or more people enter a location (a house, business, etc.) without permission, usually with the intention of committing theft. One can assume that a burglary will almost always be reported to the police and will appear in the records. Therefore, the number of burglary incidents in police records likely reflects the actual frequency of burglaries in an area.

The second type of crime was crimes related to drugs, such as drug deals. In this case, the people involved in the crimes (such as drug deals) will usually not report their occurrence. Consequently, a drug-related crime will usually only appear in the police files if the police make an active effort to detect it. Hence the data on drug-related activities does not really reflect the volume of such activities in an area but rather the police activity in the area.

The analyses of the data showed that there was no correlation between the amount of police activity in an area (as measured through the number of stop and search events in the area) and the number of burglary events (r = −0.047). However, there was a positive correlation between police activity and recorded drug-related crimes (r = 0.180). Thus, the two types of crime data indeed reflect somewhat different types of events, namely the activity of criminals (in the burglary data) and the activity of the police (in the drug-related crime data). These two types of activities can, of course, be correlated or can be related to other variables that characterize the location.

The analysis of the police databases revealed additional clear differences between the picture of reality they provide and the actual reality. In the UK Home Office drug survey for 2013, 2.8% or 280 out of 10,000 adults aged 16 to 59 reported using illicit drugs more than once a month in the last year. Assuming that these people purchased drugs once a month, they were involved in approximately 12 * 280 = 3360 drug deals in a year. In the UK police data set, the yearly average of drug-related crimes per year was about 28.7 per 10,000 people. Clearly, less than 1% of drug deals appear in police data. This demonstrates the large potential gap between the image of the reality that appears in the analysis of data and the actual reality this image is supposed to reflect.

Conclusions

The availability of data can have great value for decision-making. For instance, data-based decisions may lower the effects of biases due to faulty preconceptions or naïve beliefs. Also, many processes, such as controlling large-scale networks or high-frequency trading in financial markets, are only possible with algorithms and must rely on data.

The use of data science and AI in decision-making can often provide valuable information, but the process is not without potential problems. One needs to keep in mind that the data analysis process is a human activity that involves numerous decisions along the way. Each of them impacts the following steps in the process and the eventual outcome. It is important to monitor these decisions and to test the sensitivity of the conclusions to specific changes in the decisions made along the process. Furthermore, the analytics process often concerns human activities. The records they generate depend on the decisions of those who do the recording and, to some extent, the people whose behavior is recorded.

The development of data-based decision-making or support tools requires a combined modeling effort. On the one hand, the usual analytics modeling process needs to proceed, aiming to generate models that can identify the preferable choices in different settings. A model in this context would be the output of the algorithm used for the analytics process, together with information about the quality of the output, compared to some criterion. Often this would result from tests of the model, computed on a training set of data, on a separate, independent data set, the test set. An additional output of the algorithmic process can be information on feature importance, identifying the relative importance of different variables for predicting the outcome variable.

This should be accompanied by a modeling effort that develops more traditional social sciences models based on psychological, sociological, economic, or other disciplines. These models can be used to model the behavior that is related to the analytics process (choices made regarding the questions asked, the selection of the data, the preprocessing of data, the choice of algorithms and their parameters, the presentation of results, the interpretation, the implementation of insights gained). The models can also be related to behaviors that generate the data that is analyzed, as shown in the examples of drug-related crimes or social media posts during emergencies.

Thus, traditional modeling techniques and data science methods should be combined. Such a combination has the potential to better decisions and utilization of data. One can take several steps to achieve this goal. First, data scientists (who often have computer science, mathematics, or engineering backgrounds) should be trained in social sciences. This would give them some critical analytical skills that will allow them to question assumptions behind the analyses and the behaviors that are represented in the data. The data scientists would detach themselves from the mechanistic process of taking input, running analyses, and interpreting the results only in terms of the input variables and the model output, with the feature importance tables and other output data. Analyses of results in view of theories in the social sciences can provide a deeper understanding of phenomena beyond what is possible with a-theoretical analyses.

Also, interdisciplinary teams should analyze, evaluate or implement the results of data science processes that are used in decision-making. The output of these process needs to be critically assessed, and the value of the insights gained through the process needs to be calculated. It is important to determine how the information can actually be implemented in the operation of the organization. This requires the conduct of sensitivity analyses that evaluate the procedures and their robustness.

A critical view of the analytics process and of the implementation of its results is particularly important because data-science-based decision support always depends on the particular data that served as input for the algorithm. Dynamic changes in the data may cause predictions to become less (or sometimes more) precise. The relevance of the data for the decisions may also change with time because options become more available or less expensive or because new alternatives arise.

We need to combine traditional social science methods, such as methods in economics, political science, geography, sociology, and psychology, with the methods used in analytics and data science. There should be a dynamic interplay between the two approaches to phenomena. The combined use of the two has the potential to create a synergy that can lead to better decision-making processes and better decisions. It can also provide insights into the dynamic sha** of reality, following the use of data science, and the effects human behavior has on the data science process.

Notes

1.
See https://data.police.uk.

References

Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired Magazine, 16(7). Retrieved from https://www.wired.com/2008/06/pb-theory/
Botvinik-Nezer, R., Holzmeister, F., Camerer, C. F., Dreber, A., Huber, J., Johannesson, M., … Schonberg, T. (2020). Variability in the analysis of a single neuroimaging dataset by many teams. Nature, 582, 84–88. https://doi.org/10.1038/s41586-020-2314-9
Botzer, A., Meyer, J., Bak, P., & Parmet, Y. (2010). User settings of cue thresholds for binary categorization decisions. Journal of Experimental Psychology. Applied, 16(1), 1–15. https://doi.org/10.1037/a0018758
Chou, H.-T. G., & Edge, N. (2012). “They are happier and having better lives than I am”: The impact of using Facebook on perceptions of others’ lives. Cyberpsychology, Behavior and Social Networking, 15(2), 117–121. https://doi.org/10.1089/cyber.2011.0324
Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243(4899), 1668–1674. https://doi.org/10.1126/science.2648573
Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114–126. https://doi.org/10.1037/xge0000033
Dong, X., Meyer, J., Shmueli, E., Bozkaya, B., & Pentland, A. (2018). Methods for quantifying effects of social unrest using credit card transaction data. EPJ Data Science, 7, 8. https://doi.org/10.1140/epjds/s13688-018-0136-x
Douer, N., & Meyer, J. (2020). The responsibility quantification model of human interaction with automation. IEEE Transactions on Automation Science and Engineering, 17(2), 1044–1060. https://doi.org/10.1109/TASE.2020.2965466
Douer, N., & Meyer, J. (2021). Theoretical, measured, and subjective responsibility in aided decision making. ACM Transactions on Interactive Intelligent Systems, 11(1), 5. https://doi.org/10.1145/3425732
Eisler, S., & Meyer, J. (2020). Visual analytics and human involvement in machine learning. ar**v, 2005.06057v1. https://doi.org/10.48550/arxiv.2005.06057
Glückler, J., & Panitz, R. (2021). Unleashing the potential of relational research: A meta-analysis of network studies in human geography. Progress in Human Geography, 45(6), 1531–1557. https://doi.org/10.1177/03091325211002916
Grove, W. M., & Lloyd, M. (2006). Meehl’s contribution to clinical versus statistical prediction. Journal of Abnormal Psychology, 115(2), 192–194. https://doi.org/10.1037/0021-843X.115.2.192
Huntington-Klein, N., Arenas, A., Beam, E., Bertoni, M., Bloem, J. R., Burli, P., Chen, N., Grieco, P., Ekpe, G., Pugatch, T., Saavedra, M., & Stopnitzky, Y. (2021). The influence of hidden researcher decisions in applied microeconomics. Economic Inquiry, 59(3), 944–960. https://doi.org/10.1111/ecin.12992
Jack, R. E., Crivelli, C., & Wheatley, T. (2018). Data-driven methods to diversify knowledge of human psychology. Trends in Cognitive Sciences, 22(1), 1–5. https://doi.org/10.1016/j.tics.2017.10.002
Kent, D. M., Steyerberg, E., & van Klaveren, D. (2018). Personalized evidence based medicine: Predictive approaches to heterogeneous treatment effects. BMJ, 363, k4245. https://doi.org/10.1136/bmj.k4245
Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J., & Mullainathan, S. (2018). Human decisions and machine predictions. The Quarterly Journal of Economics, 133(1), 237–293. https://doi.org/10.1093/qje/qjx032
Mangel, M., & Samaniego, F. J. (1984). Abraham Wald’s work on aircraft survivability. Journal of the American Statistical Association, 79(386), 259–267. https://doi.org/10.1080/01621459.1984.10478038
Marras, M., Manca, M., Boratto, L., Fenu, G., & Laniado, D. (2018). BarcelonaNow: Empowering citizens with interactive dashboards for urban data exploration. WWW ’18: Companion Proceedings of the The Web Conference 2018, Lyon, 219–222. https://doi.org/10.1145/3184558.3186983
McAfee, A., & Brynjolfsson, E. (2012). Big data: The management revolution. Harvard Business Review, 90(4), 60–68.
Google Scholar
McKinney, S. M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian, H., … Shetty, S. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577, 89–94. https://doi.org/10.1038/s41586-019-1799-6
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Minneapolis, MN: University of Minnesota Press.
Book Google Scholar
Meyer, J., & Kuchar, J. K. (2021). Maximal benefits and possible detrimental effects of binary decision aids. 2021 IEEE 2nd International Conference on Human-Machine Systems (ICHMS), Magdeburg, 1–6. https://doi.org/10.1109/ICHMS53169.2021.9582632
Meyer, J., & Sheridan, T. B. (2017). The intricacies of user adjustments of alerting thresholds. Human Factors, 59(6), 901–910. https://doi.org/10.1177/0018720817698616
Meyer, J., Wiczorek, R., & Günzler, T. (2014). Measures of reliance and compliance in aided visual scanning. Human Factors, 56(5), 840–849. https://doi.org/10.1177/0018720813512865
Min, B. H., & Borch, C. (2022). Systemic failures and organizational risk management in algorithmic trading: Normal accidents and high reliability in financial markets. Social Studies of Science, 52(2), 277–302. https://doi.org/10.1177/03063127211048515
Miran, O. (2018). On the relation between data and reality: The case of crime data (Unpublished master’s thesis). Tel Aviv University, Department of Industrial Engineering, Tel Aviv, Israel.
Google Scholar
Padilla, R., Netto, S. L., & da Silva, E. A. B. (2020). A survey on performance metrics for object-detection algorithms. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 237–242. https://doi.org/10.1109/IWSSIP48289.2020.9145130
Puaschunder, J. M., Mantl, J., & Plank, B. (2020). Medicine of the future: The power of artificial intelligence (AI) and big data in healthcare. RAIS Journal for Social Science, 4(1), 1–8. https://doi.org/10.5281/zenodo.3839002
Raghunathan, S. (1999). Impact of information quality and decision-maker quality on decision quality: A theoretical model and simulation analysis. Decision Support Systems, 26(4), 275–286. https://doi.org/10.1016/S0167-9236(99)00060-3
Roig, A. (2017). Safeguards for the right not to be subject to a decision based solely on automated processing (Article 22 GDPR). European Journal of Law and Technology, 8(3). Retrieved from https://ejlt.org/index.php/ejlt/article/view/570
Shelton, T., Poorthuis, A., Graham, M., & Zook, M. (2014). Map** the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of “big data”. Geoforum, 52, 167–179. https://doi.org/10.1016/j.geoforum.2014.01.006
Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., … Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1(3), 337–356. https://doi.org/10.1177/2515245917747646
Sutton, R. T., Pincock, D., Baumgart, D. C., Sadowski, D. C., Fedorak, R. N., & Kroeker, K. I. (2020). An overview of clinical decision support systems: Benefits, risks, and strategies for success. npj Digital Medicine, 3, 17. https://doi.org/10.1038/s41746-020-0221-y
Tao, R., Su, C.-W., **ao, Y., Dai, K., & Khalid, F. (2021). Robo advisors, algorithmic trading and investment management: Wonders of fourth industrial revolution in financial markets. Technological Forecasting and Social Change, 163, 120421. https://doi.org/10.1016/j.techfore.2020.120421
Tractinsky, N., & Meyer, J. (1999). Chartjunk or goldgraph? Effects of presentation objectives and content desirability on information presentation. MIS Quarterly, 23(3), 397–420. https://doi.org/10.2307/249469
Virgilio, G. P. M. (2019). High-frequency trading: A literature review. Financial Markets and Portfolio Management, 33(2), 183–208. https://doi.org/10.1007/s11408-019-00331-6

Download references

Author information

Authors and Affiliations

Department of Industrial Engineering, Tel Aviv University, Tel Aviv, Israel
Joachim Meyer

Authors

Joachim Meyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joachim Meyer .

Editor information

Editors and Affiliations

Department of Geography, LMU Munich, Munich, Germany
Johannes Glückler
Institute of Management, University of Koblenz, Koblenz, Germany
Robert Panitz

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Meyer, J. (2024). On the Need to Understand Human Behavior to Do Analytics of Behavior. In: Glückler, J., Panitz, R. (eds) Knowledge and Digital Technology. Knowledge and Space, vol 19. Springer, Cham. https://doi.org/10.1007/978-3-031-39101-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-39101-9_3
Published: 25 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39100-2
Online ISBN: 978-3-031-39101-9
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics

On the Need to Understand Human Behavior to Do Analytics of Behavior

Abstract

Similar content being viewed by others

Understanding Behaviors in Different Domains: The Role of Machine Learning Techniques and Network Science

Big Data Predictions Devoid of Theory

Decision Intelligence Analytics: Making Decisions Through Data Pattern and Segmented Analytics

Keywords

Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

On the Need to Understand Human Behavior to Do Analytics of Behavior

Abstract

Similar content being viewed by others

Understanding Behaviors in Different Domains: The Role of Machine Learning Techniques and Network Science

Big Data Predictions Devoid of Theory

Decision Intelligence Analytics: Making Decisions Through Data Pattern and Segmented Analytics

Keywords

Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation