Abstract
We generalize the recent relative loss bounds for on-line algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is to bound the additional loss of the algorithm over the sum of the losses of the best experts for each segment. This is to model situations in which the examples change and different experts are best for certain segments of the sequence of examples. In the single segment case, the additional loss is proportional to log n, where n is the number of experts and the constant of proportionality depends on the loss function. Our algorithms do not produce the best partition; however the loss bound shows that our predictions are close to those of the best partition. When the number of segments is k+1 and the sequence is of length &ell, we can bound the additional loss of our algorithm over the best partition by \(O\left( {klogn + k\log \left( {{\ell \mathord{\left/ {\vphantom {\ell k}} \right. \kern-\nulldelimiterspace} k}} \right)} \right)\). For the case when the loss per trial is bounded by one, we obtain an algorithm whose additional loss over the loss of the best partition is independent of the length of the sequence. The additional loss becomes \(O\left( {klogn + k\log \left( {{\ell \mathord{\left/ {\vphantom {\ell k}} \right. \kern-\nulldelimiterspace} k}} \right)} \right)\) , where L is the loss of the best partitionwith k+1 segments. Our algorithms for tracking the predictions of the best expert aresimple adaptations of Vovk's original algorithm for the single best expert case. As in the original algorithms, we keep one weight per expert, and spend O(1) time per weight in each trial.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Auer, P. & Warmuth, M. K. (1998). Tracking the best disjunction. Machine Learning, this issue.
Blum, A. & Burch, C. (1997). On-line learning and the metrical task system. In Proceedings of the 10th Annual Workshop on Computational Learning Theory. ACM Press, New York, NY.
Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth, M. K. (1997). How to use expert advice. Journal of the ACM, 44(3), 427-485.
Cover, T. & Thomas, J. (1991). Elements of Information Theory. Wiley.
Feder, M., Merhav, N., & Gutman, M. (1992). Universal prediction of individual sequences. IEEE Transactions on Information Theory, 38, 1258-1270.
Freund, Y. & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.
Freund, Y., Schapire, R. E., Singer, Y., & Warmuth, M. K. (1997). Using and combining predictors that specialize. In Proceedings of the Twentyninth Annual ACM Symposium on Theory of Computing.
Haussler, D., Kivinen, J., & Warmuth, M. K. (1998). Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory. To appear.
Helmbold, D. P., Kivinen, J., & Warmuth, M. K. (1995). Worst-case loss bounds for sigmoided linear neurons. In Proceedings of the 1995 Neural Information Processing Conference, (pp. 309-315). MIT Press, Cambridge, MA.
Helmbold, D.P., Long, D.D.E., & Sherrod, B. (1996). A dynamic disk spin-down technique for mobile computing. In Proceedings of the Second Annual ACM International Conference on Mobile Computing and Networking. ACM/IEEE.
Herbster, M. (1997). Tracking the best expert II. Unpublished Manuscript.
Herbster, M. & Warmuth, M. K. (1995). Tracking the best expert. In Proceedings of the 12th International Conference on Machine Learning, (pp. 286-294). Morgan Kaufmann.
Kivinen, J. & Warmuth, M. K. (1997). Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1), 1-64.
Littlestone, N. (1988). Learning when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285-318.
Littlestone, N. (1989). Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of California Santa Cruz.
Littlestone, N. & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108(2), 212-261.
Singer, Y. (1997). Towards realistic and competitive portfolio selection algorithms. Unpublished Manuscript.
Vovk, V. (1998). A game of prediction with expert advice. Journal of Computer and System Sciences. To appear.
Vovk, V. (1997). Derandomizing stochastic prediction strategies. In Proceedings of the 10th Annual Workshop on Computational Learning Theory. ACM Press, New York, NY.
Warmuth, M. K. (1997). Predicting with the dot-product in the experts framework. Unpublished Manuscript.
Rights and permissions
About this article
Cite this article
Herbster, M., Warmuth, M.K. Tracking the Best Expert. Machine Learning 32, 151–178 (1998). https://doi.org/10.1023/A:1007424614876
Issue Date:
DOI: https://doi.org/10.1023/A:1007424614876