iKnowiSee: AR Glasses with Language Learning Translation System and Identity Recognition System Built Based on Large Pre-trained Models of Language and Vision and Internet of Things Technology

Liang, Qiwei; Chen, Yikeng; Li, Wenbiao; Lai, Minghao; Ni, Wenjian; Qiu, Hong

doi:10.1007/978-981-97-3948-6_2

Qiwei Liang⁹,
Yikeng Chen⁹,
Wenbiao Li⁹,
Minghao Lai⁹,
Wenjian Ni⁹ &
…
Hong Qiu⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2139))

Included in the following conference series:

China Intelligent Networked Things Conference

Abstract

AR glasses used in daily life have made good progress and have some practical value.However, the current design concept of AR glasses is basically to simply port the content of a cell phone and act as a secondary screen for the phone. In contrast, the AR glasses we designed are based on actual situations, focus on real-world interactions, and utilize IoT technology with the aim of enabling users to fully extract and utilize the digital information in their lives. We have created two innovative features, one is a language learning translation system for users to learn foreign languages, which integrates a large language model with an open vocabulary recognition model to fully extract the visual semantic information of the scene; and the other is a social conferencing system, which utilizes the IoT cloud, pipe, edge, and end development to reduce the cost of communication and improve the efficiency of exchanges in social situations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 128.39; Price includes VAT (Germany)

Softcover Book: EUR 74.89; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Weng, Y., Yu, C., Shi, Y., Zhao, Y., Yan, Y., Shi, Y.: FaceSight: enabling hand-to-face gesture interaction on ar glasses with a downward-facing camera vision. In: CHI Conference on Human Factors in Computing Systems (2021)
Google Scholar
Nguyen, L.T., Schicktanz, F., Stankowski, A., Avramidis, E.: Evaluating the translation of speech to virtually-performed sign language on AR glasses. In: QoMEX (2021)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: CVPR (2021)
Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with CLIP Latents. In: CVPR (2022)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Computation and Language (2017)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 × 16 Words: transformers for image recognition at scale. In: CVPR (2010)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. In: NIPS (2020)
Google Scholar
Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: ICML (2021)
Google Scholar
Li, J., Selvaraju, R.R., Gotmare, A.D., Joty, S.R., **ong, C., Hoi, S.: Align before fuse: vision and language representation learning with momentum distillation. In: NIPS (2021)
Google Scholar
Li, J., Li, D., ** language-image pre-training for unified vision-language understanding and generation. In: ICML (2022)
Google Scholar
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Conditional prompt learning for vision-language models. In: CVPR (2022)
Google Scholar
Peng, G., et al.: CLIP-adapter: better vision-language models with feature adapters. In: IJCV (2021)
Google Scholar
Lu, J., Batra, D., Parikh, D., Lee, S.: ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: NIPS (2019)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-End object detection with transformers. In: ECCV (2020)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. In: ICML (2020)
Google Scholar
Black, S., et al.: GPT-NeoX-20B: an open-source autoregressive language model. BIGSCIENCE (2022)
Google Scholar
Du, N., et al.: GLaM: efficient scaling of language models with mixture-of-experts. In: ICML (2021)
Google Scholar
Zhao, W.X., et al.: A survey of large language models. ar**v:2303.18223
Touvron, H., et al.: Computation and Language (2023)
Google Scholar
Wei, J., et al.: Emergent abilities of large language models. In: TMLR (2022)
Google Scholar
Wei, J., et al.: Finetuned language models are zero-shot learners. In: ICLR (2021)
Google Scholar
Wei, J., et al.: Chain of thought prompting elicits reasoning in large language models. In: NIPS (2022)
Google Scholar
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Zhou, D.: Self-consistency improves chain of thought reasoning in language models. In: ICLR (2022)
Google Scholar
Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. conference on empirical methods in natural language processing (2021)
Google Scholar
Fichtel, L., Kalo, J.-C., Balke, W.-T.: Prompt tuning or fine-tuning - investigating relational knowledge in pre-trained language models. In: Conference on Automated Knowledge Base Construction (2021)
Google Scholar
Cao, B., et al.: Knowledgeable or Educated Guess? revisiting language models as knowledge bases. Annual Meeting of the Association for Computational Linguistics (2021)
Google Scholar
Elazar, Y., et al.: Measuring and improving consistency in pretrained language models. Transactions of the Association for Computational Linguistics (2021)
Google Scholar
Botta, A., de Donato, W., Persico, V., Pescapé, A.: Integration of cloud computing and internet of things: a survey. Future generations computer systems (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2015)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: IEEE (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen University, Shenzhen, Guangdong, China
Qiwei Liang, Yikeng Chen, Wenbiao Li, Minghao Lai, Wenjian Ni & Hong Qiu

Authors

Qiwei Liang
View author publications
You can also search for this author in PubMed Google Scholar
Yikeng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wenbiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Minghao Lai
View author publications
You can also search for this author in PubMed Google Scholar
Wenjian Ni
View author publications
You can also search for this author in PubMed Google Scholar
Hong Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Qiu .

Editor information

Editors and Affiliations

Beihang University, Bei**g, China
Lin Zhang
Bei**g University of Posts and Telecommunications, Bei**g, China
Wensheng Yu
**dian University, **’an, China
Quan Wang
Beihang University, Bei**g, China
Yuanjun Laili
**dian University, **’an, China
Yongkui Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, Q., Chen, Y., Li, W., Lai, M., Ni, W., Qiu, H. (2024). iKnowiSee: AR Glasses with Language Learning Translation System and Identity Recognition System Built Based on Large Pre-trained Models of Language and Vision and Internet of Things Technology. In: Zhang, L., Yu, W., Wang, Q., Laili, Y., Liu, Y. (eds) Intelligent Networked Things. CINT 2024. Communications in Computer and Information Science, vol 2139. Springer, Singapore. https://doi.org/10.1007/978-981-97-3948-6_2

Download citation

DOI: https://doi.org/10.1007/978-981-97-3948-6_2
Published: 10 July 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-3947-9
Online ISBN: 978-981-97-3948-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics