Abstract
AR glasses used in daily life have made good progress and have some practical value.However, the current design concept of AR glasses is basically to simply port the content of a cell phone and act as a secondary screen for the phone. In contrast, the AR glasses we designed are based on actual situations, focus on real-world interactions, and utilize IoT technology with the aim of enabling users to fully extract and utilize the digital information in their lives. We have created two innovative features, one is a language learning translation system for users to learn foreign languages, which integrates a large language model with an open vocabulary recognition model to fully extract the visual semantic information of the scene; and the other is a social conferencing system, which utilizes the IoT cloud, pipe, edge, and end development to reduce the cost of communication and improve the efficiency of exchanges in social situations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Weng, Y., Yu, C., Shi, Y., Zhao, Y., Yan, Y., Shi, Y.: FaceSight: enabling hand-to-face gesture interaction on ar glasses with a downward-facing camera vision. In: CHI Conference on Human Factors in Computing Systems (2021)
Nguyen, L.T., Schicktanz, F., Stankowski, A., Avramidis, E.: Evaluating the translation of speech to virtually-performed sign language on AR glasses. In: QoMEX (2021)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: CVPR (2021)
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with CLIP Latents. In: CVPR (2022)
Vaswani, A., et al.: Attention is all you need. Computation and Language (2017)
Dosovitskiy, A., et al.: An image is worth 16 × 16 Words: transformers for image recognition at scale. In: CVPR (2010)
Brown, T.B., et al.: Language models are few-shot learners. In: NIPS (2020)
Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: ICML (2021)
Li, J., Selvaraju, R.R., Gotmare, A.D., Joty, S.R., **ong, C., Hoi, S.: Align before fuse: vision and language representation learning with momentum distillation. In: NIPS (2021)
Li, J., Li, D., ** language-image pre-training for unified vision-language understanding and generation. In: ICML (2022)
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Conditional prompt learning for vision-language models. In: CVPR (2022)
Peng, G., et al.: CLIP-adapter: better vision-language models with feature adapters. In: IJCV (2021)
Lu, J., Batra, D., Parikh, D., Lee, S.: ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: NIPS (2019)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-End object detection with transformers. In: ECCV (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. In: ICML (2020)
Black, S., et al.: GPT-NeoX-20B: an open-source autoregressive language model. BIGSCIENCE (2022)
Du, N., et al.: GLaM: efficient scaling of language models with mixture-of-experts. In: ICML (2021)
Zhao, W.X., et al.: A survey of large language models. ar**v:2303.18223
Touvron, H., et al.: Computation and Language (2023)
Wei, J., et al.: Emergent abilities of large language models. In: TMLR (2022)
Wei, J., et al.: Finetuned language models are zero-shot learners. In: ICLR (2021)
Wei, J., et al.: Chain of thought prompting elicits reasoning in large language models. In: NIPS (2022)
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Zhou, D.: Self-consistency improves chain of thought reasoning in language models. In: ICLR (2022)
Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. conference on empirical methods in natural language processing (2021)
Fichtel, L., Kalo, J.-C., Balke, W.-T.: Prompt tuning or fine-tuning - investigating relational knowledge in pre-trained language models. In: Conference on Automated Knowledge Base Construction (2021)
Cao, B., et al.: Knowledgeable or Educated Guess? revisiting language models as knowledge bases. Annual Meeting of the Association for Computational Linguistics (2021)
Elazar, Y., et al.: Measuring and improving consistency in pretrained language models. Transactions of the Association for Computational Linguistics (2021)
Botta, A., de Donato, W., Persico, V., Pescapé, A.: Integration of cloud computing and internet of things: a survey. Future generations computer systems (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2015)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: IEEE (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liang, Q., Chen, Y., Li, W., Lai, M., Ni, W., Qiu, H. (2024). iKnowiSee: AR Glasses with Language Learning Translation System and Identity Recognition System Built Based on Large Pre-trained Models of Language and Vision and Internet of Things Technology. In: Zhang, L., Yu, W., Wang, Q., Laili, Y., Liu, Y. (eds) Intelligent Networked Things. CINT 2024. Communications in Computer and Information Science, vol 2139. Springer, Singapore. https://doi.org/10.1007/978-981-97-3948-6_2
Download citation
DOI: https://doi.org/10.1007/978-981-97-3948-6_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-3947-9
Online ISBN: 978-981-97-3948-6
eBook Packages: Computer ScienceComputer Science (R0)