Abstract
We consider the stochastic and adversarial settings of continuum armed bandits where the arms are indexed by [0,1]d. The reward functions r:[0,1]d → ℝ are assumed to intrinsically depend on at most k coordinate variables implying \(r(x_1,\dots,x_d) = g(x_{i_1},\dots,x_{i_k})\) for distinct and unknown i 1,…,i k ∈ {1,…,d} and some locally Hölder continuous g:[0,1]k → ℝ with exponent α ∈ (0,1]. Firstly, assuming (i 1,…,i k ) to be fixed across time, we propose a simple modification of the CAB1 algorithm where we construct the discrete set of sampling points to obtain a bound of \(O(n^{\frac{\alpha+k}{2\alpha+k}} (\log n)^{\frac{\alpha}{2\alpha+k}} C(k,d))\) on the regret, with C(k,d) depending at most polynomially in k and sub-logarithmically in d. The construction is based on creating partitions of {1,…,d} into k disjoint subsets and is probabilistic, hence our result holds with high probability. Secondly we extend our results to also handle the more general case where (i 1,…,i k ) can change over time and derive regret bounds for the same.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Awerbuch, B., Kleinberg, R.: Near-optimal adaptive routing: Shortest paths and geometric generalizations. In: Proceedings of ACM Symposium on Theory of Computing (2004)
Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Online oblivious routing. In: Proceedings of ACM Symposium in Parallelism in Algorithms and Architectures, pp. 44–49 (2003)
Monteleoni, C., Jaakkola, T.: Online learning of non-stationary sequences. In: Advances in Neural Information Processing Systems (2003)
Blum, A., Kumar, V., Rudra, A., Wu, F.: Online learning in online auctions. In: Proceedings of 14th Symp. on Discrete Alg., pp. 202–204 (2003)
Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: Proceedings of Foundations of Computer Science, pp. 594–605 (2003)
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocations rules. Proceedings of Adv. in Appl. Math. 6, 4–22 (1985)
Rothschild, M.: A two-armed bandit theory of market pricing. Journal of Economic Theory 9, 185–202 (1974)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of 36th Annual Symposium on Foundations of Computer Science, pp. 322–331 (1995)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2-3), 235–256 (2002)
Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004)
Abernethy, J., Hazan, E., Rakhlin, A.: Competing in the dark: An efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory, COLT 2008 (2008)
DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx. 33, 125–143 (2011)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)
Agrawal, R.: The continuum-armed bandit problem. SIAM J. Control and Optimization 33, 1926–1951 (1995)
Cope, E.W.: Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Transactions on Automatic Control 54, 1243–1253 (2009)
Auer, P., Ortner, R., Szepesvari, C.: Improved rates for the stochastic continuum-armed bandit problem. In: Proceedings of 20th Conference on Learning Theory (COLT), pp. 454–468 (2007)
Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, STOC 2008, pp. 681–690 (2008)
Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. Journal of Machine Learning Research (JMLR) 12, 1587–1627 (2011)
Bubeck, S., Stoltz, G., Yu, J.Y.: Lipschitz bandits without the Lipschitz constant. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. LNCS (LNAI), vol. 6925, pp. 144–158. Springer, Heidelberg (2011)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2003)
Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: Proceedings of the thirty-fifth Annual ACM Symposium on Theory of Computing, STOC 2009, pp. 206–212. ACM (2003)
Naor, M., Schulman, L.J., Srinivasan, A.: Splitters and near-optimal derandomization. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 182–191 (1995)
Tyagi, H., Gärtner, B.: Continuum armed bandit problem of few variables in high dimensions. CoRR, abs/1304.5793 (2013)
Audibert, J.-Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research 11, 2635–2686 (2010)
Kleinberg, R.D.: Online Decision Problems with Large Strategy Sets. PhD thesis. MIT, Boston (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tyagi, H., Gärtner, B. (2014). Continuum Armed Bandit Problem of Few Variables in High Dimensions. In: Kaklamanis, C., Pruhs, K. (eds) Approximation and Online Algorithms. WAOA 2013. Lecture Notes in Computer Science, vol 8447. Springer, Cham. https://doi.org/10.1007/978-3-319-08001-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-08001-7_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08000-0
Online ISBN: 978-3-319-08001-7
eBook Packages: Computer ScienceComputer Science (R0)