Abstract
In many real world problems, data mining algorithms have access to massive amounts of data (defense and security). Mining all the available data is prohibitive due to computational (time and memory) constraints. Thus, the smallest sufficient training set size that obtains the same accuracy as the entire available dataset remains an important research question. Progressive sampling randomly selects an initial small sample and increases the sample size using either geometric or arithmetic series until the error converges, with the sampling schedule determined apriori. In this paper, we explore sampling schedules that are adaptive to the dataset under consideration. We develop a general approach to determine how many instances are required at each iteration for convergence using Chernoff Inequality. We try our approach on two real world problems where data is abundant: face recognition and finger print recognition using neural networks. Our empirical results show that our dynamic approach is faster and uses much fewer examples than other existing methods. However, the use of Chernoff bound requires the samples at each iteration to be independent of each other. Future work will look at removing this limitation which should further improve performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Haussler, D., Kearns, M., Seung, H.S., Tishby, N.: Rigorous Learning Curve Bounds from Statistical Mechanics. In: Proc. 7th ACM Workshop on Comp. Learning Theory (1994)
John, G., Langley, P.: Static versus dynamic sampling for data mining. In: Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 367–370 (1996)
Meek, C., Theisson, B., Heckerman, D.: The learning-curve sampling method applied to model- based clustering. The Journal of Machine Learning Research (2002)
Provost, F., Jensen, D., Oates, T.: Efficient progressive sampling. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 23–32 (1999)
Elomaa, T., Kaariainen, M.: Progressive rademacher sampling. In: Proc. 18th national conference on Artificial intelligence, Edmonton, Alberta, Canada, pp. 140–145 (2002)
Scheffer, T., Wrobel, S.: Finding the Most Interesting Patterns in a Database Quickly by Using Sequential Sampling. Journal of Machine Learning Research 3, 833–862 (2002)
Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sums of observations. Annals of Mathematical Statistics 23, 493–507 (1952)
Davidson, I.: An Ensemble Approach for Stable Learners. In: The National Conference on A.I. (AAAI), San Jose (2004)
Tarassenko, L.: A guide to Neural Computing Applications (1998)
Hulten, G., Domingos, P.: Learning from infinite data in finite time. Advances in Neural Information Processing Systems 14, 673–680 (2002)
Mitchell, T.: Machine Learning. The McGraw Hill Companies, Inc., New York (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Satyanarayana, A., Davidson, I. (2005). A Dynamic Adaptive Sampling Algorithm (DASA) for Real World Applications: Finger Print Recognition and Face Recognition. In: Hacid, MS., Murray, N.V., RaÅ›, Z.W., Tsumoto, S. (eds) Foundations of Intelligent Systems. ISMIS 2005. Lecture Notes in Computer Science(), vol 3488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11425274_65
Download citation
DOI: https://doi.org/10.1007/11425274_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25878-0
Online ISBN: 978-3-540-31949-8
eBook Packages: Computer ScienceComputer Science (R0)