Abstract
This chapter provides an extensive conclusion of the research work undertaken throughout this book. It identifies its main contributions to the emerging field of embedded machine learning processors. The chapter begins with discussing the conclusions presented in the individual chapters and how they fit into the building blocks of Machine learning (ML) at the (extreme) edge. The section also presents the book’s main contributions while discussing the remaining challenges and opportunities. Subsequently, a brief overview is provided to conclude this book with a reflection and suggestions for future work to further propel the hardware development toward flexible and efficient ML computation. Finally, a brief retrospection on the role of this book in moving the field of (extreme) edge ML toward heterogeneous multi-core systems is provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Banbury, C. R., Reddi, V. J., Torelli, P., Holleman, J., Jeffries, N., Király, C., Montino, P., Kanter, D., Ahmed, S., Pau, D., Thakker, U., Torrini, A., Warden, P., Cordaro, J., Guglielmo, G. D., Duarte, J. M., Gibellini, S., Parekh, V., Tran, H., Tran, N., Niu, W., & Xu, X. (2021). MLPerf Tiny Benchmark. CoRR abs/2106.07597.
Berg, A., O’Connor, M., & Cruz, M. T. (2021). Keyword transformer: A self-attention model for keyword spotting. Preprint. ar**v:2104.00769.
Circt: Circuit IR compilers and tools. https://circt.llvm.org/.
Frenkel, C., Lefebvre, M., & Bol, D. (2021). Learning without feedback: Fixed random learning signals allow for feedforward training of deep neural networks. Frontiers in Neuroscience, 15. https://doi.org/10.3389/fnins.2021.629892.
Genc, H., Kim, S., Amid, A., Haj-Ali, A., Iyer, V., Prakash, P., Zhao, J., Grubb, D., Liew, H., Mao, H., Ou, A., Schmidt, C., Steffl, S., Wright, J., Stoica, I., Ragan-Kelley, J., Asanovic, K., Nikolic, B., & Shao, Y. S. (2021). Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration. In Proceedings of the 58th Annual Design Automation Conference (DAC).
Lee, J., Lee, J., Han, D., Lee, J., Park, G., & Yoo, H.-J. (2019). 7.7 LNPU: A 25.3TFLOPS/W sparse deep-neural-network learning processor with fine-grained mixed precision of FP8-FP16. In 2019 IEEE International Solid- State Circuits Conference – (ISSCC) (pp. 142–144).
Liu, S., Weng, J., Kupsh, D., Sohrabizadeh, A., Wang, Z., Guo, L., Liu, J., Zhulin, M., Mani, R., Zhang, L., Cong, J., & Nowatzki, T. (2022). Overgen: Improving FPGA usability through domain-specific overlay generation. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO) (pp. 35–56).
Lu, C.-H., Wu, Y.-C., & Yang, C.-H. (2019). A 2.25 TOPS/W fully-integrated deep CNN learning processor with on-chip training. In 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC) (pp. 65–68).
Lu, L., Liu, C., Li, J., & Gong, Y. (2020). Exploring transformers for large-scale speech recognition. Preprint. ar**v:2005.09684.
Maaz, M., Shaker, A., Cholakkal, H., Khan, S., Zamir, S. W., Anwer, R. M., & Khan, F. S. (2022). EdgeNeXt: Efficiently amalgamated CNN-transformer architecture for mobile vision applications. Preprint. ar**v:2206.10589.
Mantovani, P., Giri, D., Di Guglielmo, G., Piccolboni, L., Zuckerman, J., Cota, E. G., Petracca, M., Pilato, C., & Carloni, L. P. (2020). Agile SoC development with open ESP. In Proceedings of the 39th International Conference on Computer-Aided Design (New York, NY, USA, 2020), ICCAD ’20. Association for Computing Machinery.
Mehta, S., & Rastegari, M. (2021). MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer. Preprint. ar**v:2110.02178.
Papistas, I. A., Cosemans, S., Rooseleer, B., Doevenspeck, J., Na, M.-H., Mallik, A., Debacker, P., & Verkest, D. (2021). A 22 nm, 1540 TOP/s/W, 12.1 TOP/s/mm2 in-memory analog matrix-vector-multiplier for DNN acceleration. In 2021 IEEE Custom Integrated Circuits Conference (CICC) (pp. 1–2).
Park, J., Lee, S., & Jeon, D. (2021). 9.3 a 40nm 4.81TFLOPS/W 8b floating-point training processor for non-sparse neural networks using shared exponent bias and 24-way fused multiply-add tree. In 2021 IEEE International Solid- State Circuits Conference (ISSCC) (Vol. 64, pp. 1–3).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 30). Curran Associates, Inc.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Jain, V., Verhelst, M. (2024). Conclusion. In: Towards Heterogeneous Multi-core Systems-on-Chip for Edge Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-031-38230-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-38230-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38229-1
Online ISBN: 978-3-031-38230-7
eBook Packages: EngineeringEngineering (R0)