The goal of this chapter is to provide basic notions about digital audio processing technologies. These are applied in many everyday life products such as phones, radio and television, videogames, CD players, cellular phones, etc. However, although there is a wide spectrum of applications, the main problems to be addressed in order to manipulate digital sound are essentially three: acquisition, representation and storage. The acquisition is the process of converting the physical phenomenon we call sound into a form suitable for digital processing, the representation is the problem of extracting from the sound information necessary to perform a specific task, and the storage is the problem of reducing the number of bits necessary to encode the acoustic signals.
Preview
Unable to display preview. Download preview PDF.
References
Methods for the subjective assessment of small impairments in audio systems including multichannel sound systems. Technical report, International Telecom-munication Union, 1997.
L.L. Beranek. Concert hall acoustics. Journal of the Acoustical Aociety of America, 92(1), 1992.
D.T. Blackstock. Fundamentals of Physical Acoustics. John Wiley and Sons, 2000.
J. Bormans, J. Gelissen, and A. Perkis. MPEG-21: The 21st century multimedia framework. IEEE Signal Processing Magazine, 20(2), 2003.
M. Bosi and R.E. Goldberg. Introduction to Digital Audio Coding and Standards. Kluwer, 2003.
J.C. Brown. Determination of meter of musical scores by autocorrelation. Jour-nal of the Acoustical Aociety of America, 94(4), 1993.
I. Burnett, R. Van der Walle, K. Hill, J. Bormans, and F. Pereira. MPEG-21: goals and achievements. IEEE Multimedia, 10(4), 2003.
M.J. Carey, E.S. Parris, and H. Lloyd-Thomas. A comparison of features for speech-music discrimination. In Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, pages 149-152, 1999.
J.C. Catford. Theoretical Acoustics. Oxford University Press, 2002.
P. Cummiskey. Adaptive quantization in differential PCM coding of speech. Bell Systems Technical Journal, 7:1105, 1973.
T.F.W. Embleton. Tutorial on sound propagation outdoors. Journal of the Acoustical Society of America, 100(1), 1996.
H. Fletcher. Auditory patterns. Review of Modern Physics, pages 47-65, 1940.
A. Ghias, J. Logan, D. Chamberlin, and B.C. Smith. Query by humming: musical information retrieval in audio database. In Proceedings of the ACM Conference on Multimedia, pages 231-236, 1995.
A. Hanjalic and L.-Q. Xu. Affective video content representation and modeling. IEEE Transactions on Multimedia, 7(1):143-154, 2005.
X. Huang, A. Acero, and H.-W. Hon. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice-Hall, 2001.
L.E. Kinsler, A.R. Frey, A.B. Coppens, and J.V. Sanders. Fundamentals of Acoustics. John Wiley and Sons, New York, 2000.
P. Ladefoged. Vowels and consonants. Blackwell Publishing, 2001.
C.M. Lee and S.S. Narayanan. Toward detecting emotions in spoken dialogs. IEEE Transactions on Multimedia, 13(2):293-303, 2005.
L. Lu, H. Jiang, and H.J. Zhang. A robust audio classification and segmentation method. In Proceedings of the ACM Conference on Multimedia, pages 203-211, 2001.
Y.-F. Ma, X.-S Hua, L. Lu, and H.-J. Zhang. A generic framework for user attention model and its application in video summarization. IEEE Transactions on Multimedia, 7(5):907-919, 2005.
B.S. Manjunath, P. Salembier, and T. Sikora, editors. Introduction to MPEG-7. John Wiley and Sons, Chichester, UK, 2002.
S.K. Mitra. Digital Signal Processing - A Computer Based Approach. McGraw- Hill, 1998.
B.C.J. Moore. An Introduction to the Psychology of Hearing. Academic Press, 1997.
P.M. Morse and K. Ingard. Theoretical Acoustics. McGraw-Hill, 1968.
P. Noll. Wideband speech and audio coding. IEEE Communications Magazine, (11):34-44, november 1993.
P. Noll. MPEG digital audio coding. IEEE Signal Processing Magazine, 14(5):59-81, 1997.
B.M. Oliver, J. Pierce, and C.E. Shannon. The philosophy of PCM. Proceedings of IEEE, 36:1324-1331, 1948.
A.V. Oppenheim and R.W. Schafer. Discrete-Time Signal Processing. Prentice- Hall, 1989.
T. Painter and A. Spanias. Perceptual coding of digital audio. Proceedings of IEEE, 88(4):451-513, 2000.
J.O. Pickles. An Introduction to the Physiology of Hearing. Academic Press, 1988.
L. Rabiner. On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech and Signal Processing, 25(1):24-33, 1977.
L.R. Rabiner and M.R. Sambur. Algorithm for determining the endpoints of isolated utterances. Journal of the Acoustical Society of America, 56(S1), 1974.
L.R. Rabiner and R.W. Schafer, editors. Digital Processing of Speech Signals. Prentice-Hall, 1978.
E. Scheirer and M. Slaney. Construction and evaluation of a robust multifea-ture speech/music discriminator. In Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, pages 1331-1334, 1997.
A. Spanias. Speech coding: a tutorial review. Proceedings of IEEE, 82(10):1541-1582,1994.
S. Sukittanon and L.E. Atlas. Modulation frequency features for audio finger-printing. In Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, pages 1773-1776, 2002.
E. Wold, T. Blum, D. Keislar, and J. Wheaton. Content-based classification, search and retrieval of audio. IEEE Multimedia, 3(3), 1996.
Rights and permissions
Copyright information
© 2008 Springer
About this chapter
Cite this chapter
(2008). Audio Acquisition, Representation and Storage. In: Machine Learning for Audio, Image and Video Analysis. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84800-007-0_2
Download citation
DOI: https://doi.org/10.1007/978-1-84800-007-0_2
Publisher Name: Springer, London
Print ISBN: 978-1-84800-006-3
Online ISBN: 978-1-84800-007-0
eBook Packages: Computer ScienceComputer Science (R0)