Abstract
In recent years, there has been an extremely active research boom in the fields of machine learning, particularly in artificial neural networks (ANNs). This boom was triggered by the extremely high performance of an image classification system based on cellular neural networks (CNNs) proposed in 2012. This is not only because of the high performance, but also because the proposed system uses GPGPU to reduce various computational costs, the ReLU function to prevent gradient vanish phenomena, and Dropout to realize a regularization. The availability of very easy-to-use deep learning modules such as TensorFlow and PyTorch also contributed to the rise of this type of research. As a result, various systems have been proposed and a variety of applications have been proposed. For example, the recent image generation model is a good example. On the other hand, theoretical analysis of such artificial neural networks is completely insufficient, and there is no rigorous explanation as to why each system works well.
In deep learning to date, emphasis has been placed on improving the quality of output data, and little consideration has been given to the quality of information represented by latent variables from which features in the input information are extracted. On the other hand, improvement of representation learning by contrast learning, in which latent variable representations of similar input data in the same modality are brought closer together, has begun to take place. In this study, we assume a multimodal environment and examine the possibility of bringing latent variable representations close to similar input data in various modalities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
**’no, K., Saito, T.: Analysis and synthesis of continuous-time hysteretic neural networks. IEICE Trans. A J75-A(3), 552–556 (1992)
K. **’no, T. Saito: A novel synthesis procedure for a continuous-time hysteretic associative memory. IEICE Trans. D-II J76-D-II(10), 2233–2239 (1993)
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psy. Rev. 65(6), 386–408 (1958)
Minsky, M., Papert, S.: Perceptron. MIT Press, Cambridge (1969)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. USA 79(8), 2554–2558 (1982)
Hopfield, J.J., Tank, D.W.: “Neural” computation of decisions in optimization problems. Biol. Cybern. 52, 141–152 (1985)
Saito, T., Oikawa, M., **’no, K.: Dynamics of hysteretic neural networks. In: Proceedings of 1991 IEEE International Symposium on Circuits and Systems (ISCAS 1991 Singapore), pp. 1469–1472 (1991)
**’no, K., Saito, T.: Analysis and synthesis of a continuous-time hysteresis neural network. In: Proceedings 1992 IEEE International Symposium on Circuits and Systems (ISCAS 1992 SanDiego), pp. 471–475 (1992)
**’no, K., Saito, T.: Analysis and synthesis of continuous-time hysteretic neural networks. IEICE Trans. J75-A(3), 552–556 (1992)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of COLT 1992, pp. 144–152 (1992)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS 2012, pp. 1097–1105 (2012)
Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160(1), 106–154 (1962)
Fukushima, K.: Neural network model for a mechanism of pattern recognition unaffected by shift in position-neocognitoron. IEICE Trans. A J62-A(10), 658–665 (1979)
Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 93–202 (1980)
Chua, L.O., Yang, L.: Cellular neural networks: theory. IEEE Trans. Circuits Syst. 35, 1257–1272 (1988)
Chua, L.O., Yang, L.: Cellular neural networks: applications. IEEE Trans. Circuits Syst. 35, 1273–1290 (1988)
Chua, L.O., Roska, T.: Cellular Neural Networks and Visual Computing. Cambridge University Press, Cambridge (2002)
Harrer, H., Nossek, J.A.: Discrete-time cellular neural networks. Int. J. Circuit Theory Appl. 20(5), 453–467 (1992)
Harrer, H.: Multiple layer discrete-time cellular neural networks using time-variant templates. IEEE Trans. Cir. Syst. II 40(3), 191–199 (1993)
LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
LeCun, Y., et al.: Handwritten digit recognition with a back-propagation network. In: Proceedings of NIPS 1989, pp. 396–404 (1989)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Large Scale Visual Recognition Challenge 2012 (ILSVRC 2012). https://www.image-net.org/challenges/LSVRC/2012/results.html
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR 2016 (2016)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Handwritten digit database. http://yann.lecun.com/exdb/mnist/
Reimers, N., Gurevych, I.: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Proceedings of 2019 EMNLP-IJCNLP (2019)
Izumi, M., **’no, K.: Investigation of the influence of datasets on image generation using sentence-BERT. In: 2022 International Conference of Nonlinear Theory and its Applications (NOLTA 2022), pp. 252–255 (2022)
Izumi, M., **’no, K.: Feature analysis of sentence vectors by an image generation model using sentence-BERT. IEICE NOLTA E14-N(2), 508–519 (2023)
Acknowledgement
This work was supported by JSPS KAKENHI Grant-in-Aid for Scientific Research (C) Number: 20K11978. Part of this work was carried out under the Cooperative Research Project Program of the Research Institute of Electrical Communication, Tohoku University. Also, part of this work was carried out under Future Intelligence Research Unit, Tokyo City University Prioritized Studies.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
**’no, K., Izumi, M., Okamoto, S., Dai, M., Takahashi, C., Inami, T. (2023). Fundamental Considerations on Representation Learning for Multimodal Processing. In: Mori, H., Asahi, Y. (eds) Human Interface and the Management of Information. HCII 2023. Lecture Notes in Computer Science, vol 14015. Springer, Cham. https://doi.org/10.1007/978-3-031-35132-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-35132-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35131-0
Online ISBN: 978-3-031-35132-7
eBook Packages: Computer ScienceComputer Science (R0)