Abstract
Detecting the location and identity of users is a first step in creating context-aware applications for technologically-endowed environments. We propose a system that makes use of motion detection, person tracking, face identification, feature-based identification, audio-based localization, and audio-based identification modules, fusing information with particle filters to obtain robust localization and identification. The data streams are processed with the help of the generic client-server middleware SmartFlow, resulting in a flexible architecture that runs across different platforms.
Similar content being viewed by others
References
European union. 6th framework integrated project CHIL. URL http://chil.server.de
NIST SmartFlow system. URL http://www.nist.gov/smartspace/nsfs.html
Adami A, Burget L, Dupont S, Garudadri H, Grezl F, Hermansky H, Jain P, Kajarekar S, Morgan N, Sivadas S (2002) Qualcomm-ICSI-OGI features for ASR.In: Procceedings of ICSLP, pp 21–24
Ajmera J, McCowan I, Bourlard H (2002) Robust HMM-based speech/music segmentation. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 1
Anguera X (2005) Beamformit: the robust acoustic beamforming toolkit. URL http://www.icsi.berkeley.edu/~xanguera/beamformit
Anguera X, Wooters C, Hernando J (2006) Robust speaker diarization for meetings: ICSI RT06s evaluation system. In: Proceedings of ICSLP
Barras C, Zhu X, Meignier S, Gauvain J (2004) Improving speaker diarization. In: RT-04F workshop
Bernardin K, Elbs A, Stiefelhagen R (2006) Multiple object tracking performance metrics and evaluation in a smart room environment. In: IEEE international workshop on vision algorithms, pp 53–68
Bimbot F, Bonastre JF, Fredouille C, Gravier G, Magrin-Chagnolleau I, Meignier S, Merlin T, Ortega-García J, Petrovska-Delacrétaz D, Reynolds D (2004) A tutorial of text-independent speaker verification. EURASIP J Appl Signal Process 4:430–451
Black J, Ellis T, Rosin P (2002) Multi-view image surveillance and tracking. In: IEEE workshop on motion and video computing
Carpenter J, Clifford P, Fearnhead P (1999) Improved particle filter for nonlinear problems. IEE Proc Radar Sonar Navig 146(1):2–7
Casas J, Stiefelhagen R (2005) Multi-camera/multi-microphone system design for continuous room monitoring. In: CHIL consortium deliverable D4.1
Checka N, Wilson K, Siracusa M, Darrell T (2004) Multiple person and speaker activity tracking with a particle filter. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP’04), vol 5
Chen J, Huang N, Benesty J (2004) An adaptive blind SIMO identification approach to joint multichannel time delay estimation. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 4, pp iv-53–iv-56
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans ASSP 28:357–366
DiBiase J, Silverman H, Brandstein M (2001) Microphone arrays. Robust localization in reverberant rooms. Springer, Berlin
Fleuret F, Berclaz J, Lengagne R, Fua P (2008) Multi-camera people tracking with a probabilistic occupancy map. IEEE Trans Pattern Anal Mach Intell 30(2):267–282
Fung G, Mangasarian O (2001) Proximal support vector machine classifiers. In: Proceedings of KDDM, pp 77–86
Gatica-Perez D, Lathoud G, Odobez JM, McCowan I (2007) Audiovisual probabilistic tracking of multiple speakers in meetings. IEEE Trans Audio Speech Lang Process 15(2):601–616
Gordon N, Salmond D, Smith A (1993) Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc Radar Signal Process 140(2):107–113
Haritaoğlu S, Harwood D, Davis L (2000) W4: real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell 22(8):809–830
Isard M, Blake A (1998) Condensation—conditional density propagation for visual tracking. Int J Comput Vis 29(1):5–28
Kang J, Cohen I, Medioni G (2004) Tracking people in crowded scenes across multiple cameras. In: Asian conference on computer vision
Katsarakis N, Souretis G, Talantzis F, Pnevmatikakis A, Polymenakos L (2007) 3D audiovisual person tracking using Kalman filtering and information theory. In: Lecture notes in computer science, vol 4122. Springer, Berlin, p 45
Khalaf RY, Intille SS (2001) Improving multiple people tracking using temporal consistency. MIT Dept. of Architecture, House_ n Project Technical Report
Khan Z, Balch T, Dellaert F (2003) Efficient particle filter-based tracking of multiple interacting targets using an MRF-based motion model. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems, vol 1, pp 254–259
Kirby M, Sirovich L (1990) Application of the Karhunen-Loeve procedure for the characterization of human faces. IEEE Trans Pattern Anal Mach Intell 12(1):103–108
Luque J, Anguera X, Temko A, Hernando J (2007) Speaker diarization for conference room: the UPC RT07s evaluation system. In: Proceedings of CLEAR. Lecture notes in computer science. Springer, Berlin
Luque J, Morros R, Garde A, Anguita J, Farrus M, Macho D, Marqués F, Martínez C, Vilaplana V, Hernando J (2006) Audio, video and multimodal person identification in a smart room. In: Proceedings of CLEAR 2006. Lecture notes in computer science, vol 4122. Springer, Berlin
Mittal A, Davis L (2003) M2tracker: a multi-view approach to segmenting and tracking people in a cluttered scene. Int J Comput Vis 51(3):189–203
Moraru D, Ben M, Gravier G (2005) Experiments on speaker tracking and segmentation in radio broadcast news. In: Ninth European conference on speech communication and technology
Mostefa D et al (2006) CLEAR evaluation plan v1.1. In: http://isl.ira.uka.de/~nickel/clear/downloads/chil-clear-v1.1-2006-02-21.pdf
Nickel K, Gehrig T, Stiefelhagen R, McDonough J (2005) A joint particle filter for audio-visual speaker tracking. In: Proceedings of the 7th international conference on multimodal interfaces pp 61–68
Omologo M, Svaizer P (1997) Use of the crosspower-spectrum phase in acoustic event location. IEEE Trans Speech Audio Process 5(3):288–292
Potamitis I, Tremoulis G, Fakotakis N (2003) Multi-speaker DOA tracking using interactive multiple models and probabilistic data association. In: Proceedings of European conference on speech communication and technology
Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 17(2)
Rabinkin D (1995) A framework for speech source localization using sensor arrays. PhD thesis, Brown University
Reynolds D, Torres-Carrasquillo P (2005) Approaches and applications of audio diarization. In: IEEE international conference on acoustics, speech, and signal processing, vol 5
Salah A, Alpaydın E (2004) Incremental mixtures of factor analyzers. In: International conference on pattern recognition, vol 1, pp 276–279
Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge
Stanford V, Garofolo J, Galibert O, Michel M, Laprun C (2003) The NIST smart space and meeting room projects: signals, acquisition, annotation, and metrics. Proc ICCASP 4:736–739
Stauffer C, Grimson W (1999) Adaptive background mixture models for real-time tracking. In: Proceedings of the IEEE international conference on computer vision and pattern recognition
Stiefelhagen R, Bernardin K, Bowers R, Garofolo J, Mostefa D, Soundararajan P (2007) The CLEAR 2006 evaluation. In: Proceedings of CLEAR. Lecture notes in computer science. Springer, Berlin, pp 1–44
Szeder G, Tichy W (2007) A communication middleware for smart room environments. In: Proceedings of the European conference on ambient intelligence. Lecture notes in computer science, vol 4794. Springer, Berlin, pp 195–210
Tangelder J, Schouten B (2006) Sparse face representations for face recognition in smart environments. In: International conference on pattern recognition
Temko A, Macho D, Nadeu C (2007) Enhanced SVM training for robust speech activity detection. In: Proceedings of ICCASP
Vilaplana V, Martínez C, Cruz J, Marques F (2006) Face recognition using groups of images in smart room scenarios. In: International conference on image processing (ICIP’06)
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Procedings of the IEEE conference on computer vision and pattern recognition, vol 1, pp 511–518
Wei Niu Long Jiao DH, Wang YF (2003) Real time multi person tracking in video surveillance. In: Pacific rim multimedia conference, Singapore
Wren C, Azarbayejani A, Darrell T, Pentland A (1997) Pfinder: real-time tracking of the human body. IEEE Trans Pattern Anal Mach Intell 19(7):780–785
Zhao T, Nevatia R, Wu B (2008) Segmentation and tracking of multiple humans in crowded environments. IEEE Trans Pattern Anal Mach Intell. doi:10.1109/TPAMI.2007.70770
Zhou S, Krueger V, Chellappa R (2003) Probabilistic recognition of human faces from video. Comput Vis Image Underst 91(1):214–245
Zotkin D, Duraiswami R, Davis L (2001) Multimodal 3D tracking and event detection via the particle filter. In: IEEE workshop on detection and recognition of events in video, pp 20–27
Zotkin D, Duraiswami R, Davis L (2002) Joint audio-visual tracking using particle filters. EURASIP J Appl Signal Process 2002(11):1154–1164
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by Spanish projects SAPIRE (TEC2007-65470) and PROVEC (TEC2007-66858/TCM) and Dutch projects BRICKS/BSIK and BASIS IOP GenCom.
Rights and permissions
About this article
Cite this article
Salah, A.A., Morros, R., Luque, J. et al. Multimodal identification and localization of users in a smart environment. J Multimodal User Interfaces 2, 75–91 (2008). https://doi.org/10.1007/s12193-008-0008-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-008-0008-y