Fundamental Considerations on Representation Learning for Multimodal Processing

  • Conference paper
  • First Online:
Human Interface and the Management of Information (HCII 2023)

Abstract

In recent years, there has been an extremely active research boom in the fields of machine learning, particularly in artificial neural networks (ANNs). This boom was triggered by the extremely high performance of an image classification system based on cellular neural networks (CNNs) proposed in 2012. This is not only because of the high performance, but also because the proposed system uses GPGPU to reduce various computational costs, the ReLU function to prevent gradient vanish phenomena, and Dropout to realize a regularization. The availability of very easy-to-use deep learning modules such as TensorFlow and PyTorch also contributed to the rise of this type of research. As a result, various systems have been proposed and a variety of applications have been proposed. For example, the recent image generation model is a good example. On the other hand, theoretical analysis of such artificial neural networks is completely insufficient, and there is no rigorous explanation as to why each system works well.

In deep learning to date, emphasis has been placed on improving the quality of output data, and little consideration has been given to the quality of information represented by latent variables from which features in the input information are extracted. On the other hand, improvement of representation learning by contrast learning, in which latent variable representations of similar input data in the same modality are brought closer together, has begun to take place. In this study, we assume a multimodal environment and examine the possibility of bringing latent variable representations close to similar input data in various modalities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (Canada)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. **’no, K., Saito, T.: Analysis and synthesis of continuous-time hysteretic neural networks. IEICE Trans. A J75-A(3), 552–556 (1992)

    Google Scholar 

  2. K. **’no, T. Saito: A novel synthesis procedure for a continuous-time hysteretic associative memory. IEICE Trans. D-II J76-D-II(10), 2233–2239 (1993)

    Google Scholar 

  3. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psy. Rev. 65(6), 386–408 (1958)

    Article  Google Scholar 

  4. Minsky, M., Papert, S.: Perceptron. MIT Press, Cambridge (1969)

    Google Scholar 

  5. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)

    Article  MATH  Google Scholar 

  6. Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. USA 79(8), 2554–2558 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  7. Hopfield, J.J., Tank, D.W.: “Neural” computation of decisions in optimization problems. Biol. Cybern. 52, 141–152 (1985)

    Google Scholar 

  8. Saito, T., Oikawa, M., **’no, K.: Dynamics of hysteretic neural networks. In: Proceedings of 1991 IEEE International Symposium on Circuits and Systems (ISCAS 1991 Singapore), pp. 1469–1472 (1991)

    Google Scholar 

  9. **’no, K., Saito, T.: Analysis and synthesis of a continuous-time hysteresis neural network. In: Proceedings 1992 IEEE International Symposium on Circuits and Systems (ISCAS 1992 SanDiego), pp. 471–475 (1992)

    Google Scholar 

  10. **’no, K., Saito, T.: Analysis and synthesis of continuous-time hysteretic neural networks. IEICE Trans. J75-A(3), 552–556 (1992)

    Google Scholar 

  11. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of COLT 1992, pp. 144–152 (1992)

    Google Scholar 

  12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS 2012, pp. 1097–1105 (2012)

    Google Scholar 

  13. Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160(1), 106–154 (1962)

    Article  Google Scholar 

  14. Fukushima, K.: Neural network model for a mechanism of pattern recognition unaffected by shift in position-neocognitoron. IEICE Trans. A J62-A(10), 658–665 (1979)

    Google Scholar 

  15. Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 93–202 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  16. Chua, L.O., Yang, L.: Cellular neural networks: theory. IEEE Trans. Circuits Syst. 35, 1257–1272 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  17. Chua, L.O., Yang, L.: Cellular neural networks: applications. IEEE Trans. Circuits Syst. 35, 1273–1290 (1988)

    Article  MathSciNet  Google Scholar 

  18. Chua, L.O., Roska, T.: Cellular Neural Networks and Visual Computing. Cambridge University Press, Cambridge (2002)

    Google Scholar 

  19. Harrer, H., Nossek, J.A.: Discrete-time cellular neural networks. Int. J. Circuit Theory Appl. 20(5), 453–467 (1992)

    Article  MATH  Google Scholar 

  20. Harrer, H.: Multiple layer discrete-time cellular neural networks using time-variant templates. IEEE Trans. Cir. Syst. II 40(3), 191–199 (1993)

    MATH  Google Scholar 

  21. LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)

    Article  Google Scholar 

  22. LeCun, Y., et al.: Handwritten digit recognition with a back-propagation network. In: Proceedings of NIPS 1989, pp. 396–404 (1989)

    Google Scholar 

  23. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  24. Large Scale Visual Recognition Challenge 2012 (ILSVRC 2012). https://www.image-net.org/challenges/LSVRC/2012/results.html

  25. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR 2016 (2016)

    Google Scholar 

  26. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  27. Handwritten digit database. http://yann.lecun.com/exdb/mnist/

  28. Reimers, N., Gurevych, I.: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Proceedings of 2019 EMNLP-IJCNLP (2019)

    Google Scholar 

  29. Izumi, M., **’no, K.: Investigation of the influence of datasets on image generation using sentence-BERT. In: 2022 International Conference of Nonlinear Theory and its Applications (NOLTA 2022), pp. 252–255 (2022)

    Google Scholar 

  30. Izumi, M., **’no, K.: Feature analysis of sentence vectors by an image generation model using sentence-BERT. IEICE NOLTA E14-N(2), 508–519 (2023)

    Google Scholar 

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Grant-in-Aid for Scientific Research (C) Number: 20K11978. Part of this work was carried out under the Cooperative Research Project Program of the Research Institute of Electrical Communication, Tohoku University. Also, part of this work was carried out under Future Intelligence Research Unit, Tokyo City University Prioritized Studies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kenya **’no .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

**’no, K., Izumi, M., Okamoto, S., Dai, M., Takahashi, C., Inami, T. (2023). Fundamental Considerations on Representation Learning for Multimodal Processing. In: Mori, H., Asahi, Y. (eds) Human Interface and the Management of Information. HCII 2023. Lecture Notes in Computer Science, vol 14015. Springer, Cham. https://doi.org/10.1007/978-3-031-35132-7_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-35132-7_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-35131-0

  • Online ISBN: 978-3-031-35132-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation