Fundamental Considerations on Representation Learning for Multimodal Processing

**’no, Kenya; Izumi, Masato; Okamoto, Saki; Dai, Mizuki; Takahashi, Chisato; Inami, Tatsuro

doi:10.1007/978-3-031-35132-7_29

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14015))

Included in the following conference series:

International Conference on Human-Computer Interaction

879 Accesses

Abstract

In recent years, there has been an extremely active research boom in the fields of machine learning, particularly in artificial neural networks (ANNs). This boom was triggered by the extremely high performance of an image classification system based on cellular neural networks (CNNs) proposed in 2012. This is not only because of the high performance, but also because the proposed system uses GPGPU to reduce various computational costs, the ReLU function to prevent gradient vanish phenomena, and Dropout to realize a regularization. The availability of very easy-to-use deep learning modules such as TensorFlow and PyTorch also contributed to the rise of this type of research. As a result, various systems have been proposed and a variety of applications have been proposed. For example, the recent image generation model is a good example. On the other hand, theoretical analysis of such artificial neural networks is completely insufficient, and there is no rigorous explanation as to why each system works well.

In deep learning to date, emphasis has been placed on improving the quality of output data, and little consideration has been given to the quality of information represented by latent variables from which features in the input information are extracted. On the other hand, improvement of representation learning by contrast learning, in which latent variable representations of similar input data in the same modality are brought closer together, has begun to take place. In this study, we assume a multimodal environment and examine the possibility of bringing latent variable representations close to similar input data in various modalities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (Canada)

eBook: USD 84.99; Price excludes VAT (Canada)

Softcover Book: USD 109.99; Price excludes VAT (Canada)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

**’no, K., Saito, T.: Analysis and synthesis of continuous-time hysteretic neural networks. IEICE Trans. A J75-A(3), 552–556 (1992)
Google Scholar
K. **’no, T. Saito: A novel synthesis procedure for a continuous-time hysteretic associative memory. IEICE Trans. D-II J76-D-II(10), 2233–2239 (1993)
Google Scholar
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psy. Rev. 65(6), 386–408 (1958)
Article Google Scholar
Minsky, M., Papert, S.: Perceptron. MIT Press, Cambridge (1969)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Article MATH Google Scholar
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. USA 79(8), 2554–2558 (1982)
Article MathSciNet MATH Google Scholar
Hopfield, J.J., Tank, D.W.: “Neural” computation of decisions in optimization problems. Biol. Cybern. 52, 141–152 (1985)
Google Scholar
Saito, T., Oikawa, M., **’no, K.: Dynamics of hysteretic neural networks. In: Proceedings of 1991 IEEE International Symposium on Circuits and Systems (ISCAS 1991 Singapore), pp. 1469–1472 (1991)
Google Scholar
**’no, K., Saito, T.: Analysis and synthesis of a continuous-time hysteresis neural network. In: Proceedings 1992 IEEE International Symposium on Circuits and Systems (ISCAS 1992 SanDiego), pp. 471–475 (1992)
Google Scholar
**’no, K., Saito, T.: Analysis and synthesis of continuous-time hysteretic neural networks. IEICE Trans. J75-A(3), 552–556 (1992)
Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of COLT 1992, pp. 144–152 (1992)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS 2012, pp. 1097–1105 (2012)
Google Scholar
Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160(1), 106–154 (1962)
Article Google Scholar
Fukushima, K.: Neural network model for a mechanism of pattern recognition unaffected by shift in position-neocognitoron. IEICE Trans. A J62-A(10), 658–665 (1979)
Google Scholar
Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 93–202 (1980)
Article MathSciNet MATH Google Scholar
Chua, L.O., Yang, L.: Cellular neural networks: theory. IEEE Trans. Circuits Syst. 35, 1257–1272 (1988)
Article MathSciNet MATH Google Scholar
Chua, L.O., Yang, L.: Cellular neural networks: applications. IEEE Trans. Circuits Syst. 35, 1273–1290 (1988)
Article MathSciNet Google Scholar
Chua, L.O., Roska, T.: Cellular Neural Networks and Visual Computing. Cambridge University Press, Cambridge (2002)
Google Scholar
Harrer, H., Nossek, J.A.: Discrete-time cellular neural networks. Int. J. Circuit Theory Appl. 20(5), 453–467 (1992)
Article MATH Google Scholar
Harrer, H.: Multiple layer discrete-time cellular neural networks using time-variant templates. IEEE Trans. Cir. Syst. II 40(3), 191–199 (1993)
MATH Google Scholar
LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
LeCun, Y., et al.: Handwritten digit recognition with a back-propagation network. In: Proceedings of NIPS 1989, pp. 396–404 (1989)
Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Large Scale Visual Recognition Challenge 2012 (ILSVRC 2012). https://www.image-net.org/challenges/LSVRC/2012/results.html
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR 2016 (2016)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Handwritten digit database. http://yann.lecun.com/exdb/mnist/
Reimers, N., Gurevych, I.: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Proceedings of 2019 EMNLP-IJCNLP (2019)
Google Scholar
Izumi, M., **’no, K.: Investigation of the influence of datasets on image generation using sentence-BERT. In: 2022 International Conference of Nonlinear Theory and its Applications (NOLTA 2022), pp. 252–255 (2022)
Google Scholar
Izumi, M., **’no, K.: Feature analysis of sentence vectors by an image generation model using sentence-BERT. IEICE NOLTA E14-N(2), 508–519 (2023)
Google Scholar

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Grant-in-Aid for Scientific Research (C) Number: 20K11978. Part of this work was carried out under the Cooperative Research Project Program of the Research Institute of Electrical Communication, Tohoku University. Also, part of this work was carried out under Future Intelligence Research Unit, Tokyo City University Prioritized Studies.

Author information

Authors and Affiliations

Tokyo City University, Tokyo, 158-8557, Japan
Kenya **’no
Graduate School of Tokyo City University, Tokyo, 158-8557, Japan
Masato Izumi, Saki Okamoto, Mizuki Dai, Chisato Takahashi & Tatsuro Inami

Authors

Kenya **’no
View author publications
You can also search for this author in PubMed Google Scholar
Masato Izumi
View author publications
You can also search for this author in PubMed Google Scholar
Saki Okamoto
View author publications
You can also search for this author in PubMed Google Scholar
Mizuki Dai
View author publications
You can also search for this author in PubMed Google Scholar
Chisato Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuro Inami
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kenya **’no .

Editor information

Editors and Affiliations

Tokyo City University, Tokyo, Japan
Hirohiko Mori
Tokyo University of Science, Tokyo, Japan
Yumi Asahi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

**’no, K., Izumi, M., Okamoto, S., Dai, M., Takahashi, C., Inami, T. (2023). Fundamental Considerations on Representation Learning for Multimodal Processing. In: Mori, H., Asahi, Y. (eds) Human Interface and the Management of Information. HCII 2023. Lecture Notes in Computer Science, vol 14015. Springer, Cham. https://doi.org/10.1007/978-3-031-35132-7_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-35132-7_29
Published: 09 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35131-0
Online ISBN: 978-3-031-35132-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fundamental Considerations on Representation Learning for Multimodal Processing