The Applications of Machine Learning in Accounting and Auditing Research

Living reference work entry
First Online: 09 December 2021

pp 1–21
Cite this living reference work entry

Encyclopedia of Finance

Hanxin Hu³ &
Ting Sun⁴

438 Accesses

Abstract

The term “machine learning” has become a buzzword in the past few years. In accounting and auditing area, while this technology has been used in major accounting firms such as Big 4 s, its research is still evolving. Increased use of machine learning and other artificial intelligence techniques will allow accountants to focus on providing better decision support instead of on data gathering and manual analyses. This entry introduces machine learning as compared to traditional statistical modeling, discusses its current applications in accounting and auditing research, and provides directions for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Similar content being viewed by others

The Applications of Machine Learning in Accounting and Auditing Research

Chapter © 2022

Machine Learning in Accounting Research

Chapter © 2022

Impact of Machine Learning on the Improvement of Accounting Information Quality

Chapter © 2023

Notes

1.
The data source is https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data. In this application case, explanatory variables including room types, geographical availability, and the number of reviews per month are treated as the independent variables to predict prices.
2.
The data source is http://yann.lecun.com/exdb/mnist/
3.
One of the examples in this domain is in https://www.kaggle.com/vjchoudhary7/customer-segmentation-tutorial-in-python. The task of this data set is to segment customers based on their behavioral related attributes by applying the K-means clustering algorithms.
4.
A bootstrap replicate is accessed by randomly sampling the training set with replacement. This operation will generate a new training set with size equal to that of the original one.
5.
The statistical problem means that the training set fails to provide adequate information to select one single learner within the circumstance that multiple unique learners can achieve the same accuracy on the training set.
6.
Searching for the best hypothesis (e.g., neutral network) that fits in the training data may be computationally intractable.
7.
The approximations to the real target function, which are generated from single learners, may not be ideal.
8.
“Complexity” means that the decision tree model generates a plethora of rules, resulting the overfitting issues.
9.
For details of the cross-validation method, check the website: https://scikit-learn.org/stable/modules/cross_validation.html
10.
More details are presented on the website: https://www.tensorflow.org/
11.
Shapley, Lloyd S. (August 21, 1951). “Notes on the n-Person Game -- II: The Value of an n-Person Game” (PDF). Santa Monica, Calif.: RAND Corporation.
12.
Read “Cooperative game theory assumes that groups of players, called coalitions, are the primary units of decision-making, and may enforce cooperative behavior.” (Choudhary 2019). https://www.analyticsvidhya.com/blog/2019/11/shapley-value-machine-learning-interpretability-game-theory/
13.
Local surrogate models are interpretable models that are used to explain individual predictions of black box machine learning models.

References

Adadi, A., and M. Berrada. 2018. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 6: 52138–52160.
Article Google Scholar
Agrawal, R., T. Imieliński, and A. Swami. 1993. Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp 207–216.
Google Scholar
Alpaydin, E. 2020. Introduction to machine learning. Cham: MIT Press.
Google Scholar
Altman, E.I. 1968. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance 23: 589–609.
Google Scholar
Analyticsai, C. 2020. AnalyticsAI for every engagement [Online]. Available: https://www.caseware.com/us/analyticsai. Accessed.
Anand, V., R. Brunner, K. Ikegwu, and T. Sougiannis. 2019. Predicting profitability using machine learning. Available at SSRN 3466478.
Google Scholar
Anthony, M., and P.L. Bartlett. 2009. Neural network learning: Theoretical foundations. Cambridge: Cambridge University Press.
Google Scholar
Apley, D.W. 2016. Visualizing the effects of predictor variables in black box supervised learning models. ar**v preprint ar**v:1612.08468.
Google Scholar
Apley, D.W., and J. Zhu. 2020. Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society, Series B: Statistical Methodology 82 (4): 1059–1086.
Google Scholar
Bao, Y., and A. Datta. 2014. Simultaneously discovering and quantifying risk types from textual risk disclosures. Management Science 60: 1371–1391.
Google Scholar
Bao, Y., B. Ke, B. Li, Y.J. Yu, and J. Zhang. 2020. Detecting accounting fraud in publicly traded US firms using a machine learning approach. Journal of Accounting Research 58 (1): 199–235.
Google Scholar
Barboza, F., H. Kimura, and E. Altman. 2017. Machine learning models and bankruptcy prediction. Expert Systems with Applications 83: 405–417.
Google Scholar
Barth, Mary E. and Li, Ken and McClure, Charles. 2021. Evolution in Value Relevance of Accounting Information. Stanford University Graduate School of Business Research Paper No. 17-24, Available at SSRN: https://ssrn.com/abstract=2933197 or https://doi.org/10.2139/ssrn.2933197.
Beneish, M.D. 1999. The detection of earnings manipulation. Financial Analysts Journal 55: 24–36.
Google Scholar
Bertomeu, J. 2020. Machine learning improves accounting: Discussion, implementation and research opportunities. Review of Accounting Studies 25: 1135–1155.
Google Scholar
Bertomeu, J., E. Cheynel, E. Floyd, and W. Pan. 2020. Using machine learning to detect misstatements. Review of Accounting Studies 26: 1–52.
Google Scholar
Bishop, C.M. 2006. Pattern recognition and machine learning. Springer.
Google Scholar
Breiman, L. 1996. Bagging predictors. Machine Learning 24: 123–140.
Google Scholar
Brown, N.C., R.M. Crowley, and W.B. Elliott. 2020. What are you saying? Using topic to detect financial misreporting. Journal of Accounting Research 58 (1): 237–291.
Google Scholar
Brown-Liburd, H., A. Cheong, M.A. Vasarhelyi, and X. Wang. 2019. Measuring with exogenous data (MED), and government economic monitoring (GEM). Journal of Emerging Technologies in Accounting. 16 (1): 1–19.
Google Scholar
Bzdok, D., N. Altman, and M. Krzywinski. 2018. Points of significance: Statistics versus machine learning. Nature Methods 15 (4): 233–234. https://www.nature.com/articles/nmeth.4642.pdf?origin=ppub.
Google Scholar
Carton, R.B., and C.W. Hofer. 2006. Measuring organizational performance: Metrics for entrepreneurship and strategic management research. Edward Elgar Publishing.
Google Scholar
Cecchini, M., H. Aytug, G.J. Koehler, and P. Pathak. 2010a. Making words work: Using financial text as a predictor of financial events. Decision Support Systems 50: 164–175.
Google Scholar
Cecchini, A., et al. 2010b. Detecting management fraud in public companies. Management Science 56 (7): 1146–1160. https://doi.org/10.1287/mnsc.1100.1174.
Article Google Scholar
Chen, M.-S., J. Han, and P.S. Yu. 1996. Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering 8: 866–883.
Google Scholar
Cho, S., M.A. Vasarhelyi, T. Sun, and C. Zhang. 2020. Learning from machine learning in accounting and assurance. Journal of Emerging Technologies in Accounting.
Google Scholar
Chollet, F. 2017. Deep learning with python. Shelter Island: Manning Publications Company.
Google Scholar
Choudhary, A. 2019. Analytics vidhya. A unique method for machine learning interpretability: Game theory & shapley values. https://www.analyticsvidhya.com/blog/2019/11/shapley-value-machine-learning-interpretability-game-theory/.
Dechow, P.M., and I.D. Dichev. 2002. The quality of accruals and earnings: The role of accrual estimation errors. The Accounting Review 77: 35–59.
Google Scholar
Dechow, P.M., W. Ge, C.R. Larson, and R.G. Sloan. 2011. Predicting material accounting misstatements. Contemporary Accounting Research 28 (1): 17–82.
Google Scholar
Dietterich, T.G. 2002. Ensemble learning. In The handbook of brain theory and neural networks, vol. 2, 110–125. Cambridge, MA: MIT Press.
Google Scholar
Ding, K., B. Lev, X. Peng, T. Sun, and M.A. Vasarhelyi. 2020. Machine learning improves accounting estimates: Evidence from insurance payments. Available at SSRN 3253220.
Google Scholar
Expert.ai. 2020. What is machine learning? A definition. https://www.expert.ai/blog/machine-learning-definition/
Foote, K.D. 2019. A brief history of machine learning. Data Topics. Dataversity. https://www.dataversity.net/a-brief-history-of-machine-learning/
Frankel, R., J. Jennings, and J. Lee. 2016. Using unstructured and qualitative disclosures to explain accruals. Journal of Accounting and Economics 62: 209–227.
Google Scholar
Ghahramani, Z. 2015. Probabilistic machine learning and artificial intelligence. Nature 521 (7553): 452–459. https://www.repository.cam.ac.uk/bitstream/handle/1810/248538/Ghahramani%202015%20Nature.pdf;jsessionid=3DB2D31FFA80196A97AEEBECB06FEF42?sequence=1.
Google Scholar
Goel, S., J. Gangolly, S.R. Faerman, and O. Uzuner. 2010. Can linguistic predictors detect fraudulent financial filings. Journal of Emerging Technologies in Accounting. 7: 25–46.
Google Scholar
Hammond, K. 2016. 5 unexpected sources of bias in artificial intelligence. Available at: https://techcrunch.com/2016/12/10/5-unexpected-sources-of-bias-in-artificial-intelligence/
Healthcare.ai. 2020. Machine learning versus statistics: When to use each. Data Science Blog. https://healthcare.ai/machine-learning-versus-statistics-use/
Hebb, D.O. 1949. The organization of behavior: A neuropsychological theory. New York, London: J. Wiley, Chapman & Hall. http://s-f-walker.org.uk/pubsebooks/pdfs/The_Organization_of_Behavior-Donald_O._Hebb.pdf.
Google Scholar
Heller, M. 2019. Machine learning algorithms explained [Online]. Available: https://www.infoworld.com/article/3394399/machine-learning-algorithms-explained.html. Accessed.
Huang, X.S., and L. Sun. 2017. Managerial ability and real earnings management. Advances in Accounting 39: 91–104.
Google Scholar
Huang, A.H., A.Y. Zang, and R. Zheng. 2014. Evidence on the information content of text in analyst reports. The Accounting Review 89: 2151–2180.
Google Scholar
Hu, H., T. Sun, M.A. Vasarhelyi, and M. Zhang. 2020. A Machine Learning Approach of Measuring Audit Quality: Evidence From China. Available at SSRN 3732563.
Google Scholar
Huang, A.H., R. Lehavy, A.Y. Zang, and R. Zheng. 2018. Analyst information discovery and interpretation roles: A topic modeling approach. Management Science 64: 2833–2855.
Google Scholar
Hunt, J.O., D.M. Rosser, and S.P. Rowe. 2021. Using machine learning to predict auditor switches: How the likelihood of switching affects audit quality among non-switching clients. Journal of Accounting and Public Policy 40(5): p.106785.
Google Scholar
Khalid, S., T. Khalil, and S. Nasreen. 2014. A survey of feature selection and feature extraction techniques in machine learning. 2014 Science and information conference. IEEE, 372–378.
Google Scholar
Kim, H.S., and S.Y. Sohn. 2010. Support vector machines for default prediction of SMEs based on technology credit. European Journal of Operational Research 201: 838–846.
Google Scholar
Kober, J., J.A. Bagnell, and J. Peters. 2013. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32: 1238–1274.
Google Scholar
Lecun, Y., Y. Bengio, and G. Hinton. 2015. Deep learning. Nature 521: 436–444.
Google Scholar
Lefkowitz, M. 2019. Professor’s perceptron paved the way for AI: 60 years too soon. Cornell Chronicle. https://news.cornell.edu/stories/2019/09/professors-perceptron-paved-way-ai-60-years-too-soon
Li, F. 2010. The information content of forward-looking statements in corporate filings – A naïve Bayesian machine learning approach. Journal of Accounting Research 48: 1049–1102.
Google Scholar
Odom, M.D., and R. Sharda. 1990. A neural network model for bankruptcy prediction. 1990 IJCNN International Joint Conference on neural networks. IEEE, 163–168.
Google Scholar
Ohlson, J.A. 1980. Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research 18: 109–131.
Google Scholar
Olson, D.L., D. Delen, and Y. Meng. 2012. Comparative analysis of data mining methods for bankruptcy prediction. Decision Support Systems 52: 464–473.
Google Scholar
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, and V. Dubourg. 2011. Scikit-learn: Machine learning in python. The Journal of Machine Learning Research 12: 2825–2830.
Google Scholar
Perols, J. 2011. Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Auditing 30 (2): 19–50.
Google Scholar
Perols, J.L., R.M. Bowen, C. Zimmermann, and B. Samba. 2017. Finding needles in a haystack: Using data analytics to improve fraud prediction. The Accounting Review 92 (2): 221–245.
Google Scholar
Platt, H.D., M.B. Platt, and J.G. Pedersen. 1994. Bankruptcy discrimination with real variables. Journal of Business Finance & Accounting 21: 491–510.
Google Scholar
Provalis Research. 2017. Blogs on Text Analytics: A Brief History of Machine Learning. https://provalisresearch.com/blog/brief-historymachine-learning/.
Purda, L., and D. Skillicorn. 2015. Accounting variables, deception, and a bag of words: Assessing the tools of fraud detection. Contemporary Accounting Research 32: 1193–1223.
Google Scholar
Rosenblatt, F. 1957. The perceptron: A perceiving and recognizing automation (Project Para). https://blogs.umass.edu/brain-wars/files/2016/03/rosenblatt-1957.pdf
Roth, Alvin E., ed. 1988. The Shapley value: Essays in honor of Lloyd S. Shapley. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511528446. ISBN 0-521-36177-X.
Book Google Scholar
Sallab, A.E., M. Abdou, E. Perot, and S. Yogamani. 2017. Deep reinforcement learning framework for autonomous driving. Electronic Imaging 2017: 70–76.
Google Scholar
Shalev-Shwartz, S., and S. Ben-David. 2014. Understanding machine learning: From theory to algorithms. Cambridge: Cambridge University Press.
Google Scholar
Shaw, R. 2017. Top 10 machine learning algorithms for beginners [Online]. KDnuggets. Available: https://www.kdnuggets.com/2017/10/top-10-machine-learning-algorithms-beginners.html. Accessed.
Shin, K.-S., T.S. Lee, and H.-J. Kim. 2005. An application of support vector machines in bankruptcy prediction model. Expert Systems with Applications 28: 127–135.
Google Scholar
Sidhu, H. 2019. How audit digitization reflects a transformative age. Available at: https://www.ey.com/en_gl/digital-audit/auditdigitization-transformative-age
Sun, T. 2019. Applying deep learning to audit procedures: An illustrative framework. Accounting Horizons 33 (3): 89–109.
Google Scholar
Sutton, R.S., and A.G. Barto. 2018. Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Google Scholar
Tsai, C.-F., Y.-F. Hsu, and D.C. Yen. 2014. A comparative study of classifier ensembles for bankruptcy prediction. Applied Soft Computing 24: 977–984.
Google Scholar
Van Den Bogaerd, M., and W. Aerts. 2011. Applying machine learning in accounting research. Expert Systems with Applications 38: 13414–13424.
Google Scholar
Van Der Maaten, L., E. Postma, and J. Van Den Herik. 2009. Dimensionality reduction: A comparative. Journal of Machine Learning Research 10: 13.
Google Scholar
Wiederhold Gio, John McCarthy, and Ed Feigenbaum. 1990. “Memorial resolution: Arthur L. Samuel” (PDF). Stanford University Historical Society. Archived from the original (PDF) on 26 May 2011. Retrieved April 29, 2011. https://web.archive.org/web/20110526195107/http://histsoc.stanford.edu/pdfmem/SamuelA.pdf
Yang, Z., M.B. Platt, and H.D. Platt. 1999. Probabilistic neural networks in bankruptcy prediction. Journal of Business Research 44: 67–74.
Google Scholar
Yang, J.C., H.C. Chuang, and C.M. Kuan. 2020. Double machine learning with gradient boosting and its application to the Big N audit quality effect. Journal of Econometrics 216: 268–283.
Google Scholar
Zang, A.Y. 2012. Evidence on the trade-off between real activities manipulation and accrual-based earnings management. The Accounting Review 87 (2): 675–703.
Google Scholar
Zhao, Q., and S.S. Bhowmick. 2003. Association rule mining: A survey. Vol. 135. Singapore: Nanyang Technological University.
Google Scholar
Zhao, Y., Z. Nasrullah, and Z. Li. 2019. Pyod: A python toolbox for scalable outlier detection. ar**v preprint ar**v:1901.01588.
Google Scholar
Zhou, Z.-H. 2009. Ensemble learning. In Encyclopedia of biometrics, vol. 1, 270–273. New York: Springer.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Accounting and Information Systems, Rutgers University, Newark, NJ, USA
Hanxin Hu
Department of Accounting and Information Systems, The College of New Jersey, Ewing, NJ, USA
Ting Sun

Authors

Hanxin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ting Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ting Sun .

Editor information

Editors and Affiliations

Rutgers School of Business, Rutgers, The State University of New Jer, Piscataway, NJ, USA
Cheng-Few Lee
Phoenix, AZ, USA
Alice C. Lee

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this entry

Cite this entry

Hu, H., Sun, T. (2021). The Applications of Machine Learning in Accounting and Auditing Research. In: Lee, CF., Lee, A.C. (eds) Encyclopedia of Finance. Springer, Cham. https://doi.org/10.1007/978-3-030-73443-5_91-1

Download citation

DOI: https://doi.org/10.1007/978-3-030-73443-5_91-1
Received: 13 February 2021
Accepted: 06 April 2021
Published: 09 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73443-5
Online ISBN: 978-3-030-73443-5
eBook Packages: Springer Reference Economics and FinanceReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences

Publish with us

Policies and ethics