Abstract
We consider a Gaussian multi-armed bandit problem with both reward means and variances unknown. A Gaussian multi-armed bandit is considered because in case of batch processing the cumulative rewards for the batches are distributed close to normally. A batch version of the UCB strategy is proposed. Strategy’s description that is invariant in regards to the horizon size is obtained. We consider different approaches to the task of estimating unknown variances of rewards and study their effect on the normalized regret. A set of Monte-Carlo simulations is performed to study the batch strategy and illustrate the results for the two-armed bandit.
Supported by Russian Science Foundation, project number 23-21-00447, https://rscf.ru/en/project/23-21-00447/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lattimore, T., Szepesvari, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2020)
Sragovich, V.: Mathematical Theory of Adaptive Control. World Scientific, Singapore (2006)
Tsetlin, M.: Automaton Theory and Modeling of Biological Systems. Academic Press, New York (1973)
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397–422 (2002)
Lugosi, G., Cesa-Bianchi, N.: Prediction, Learning and Game. University Press, New York (2006)
Berry, D., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London (1985)
Gittins, J.: Multi-armed bandit allocation indices. In: Wiley-Interscience Series in Systems and Optimization. John Wiley & Sons, Ltd., Chichester (1989)
Zhang, D., Lu, J.: Batch-mode computational advertising based on modern portfolio theory. In: Azzopardi, L., et al. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 380–383. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04417-5_44
Kolnogorov, A.V.: Parallel design of robust control in the stochastic environment (the two-armed bandit problem). Autom. Remote Control 73, 689–701 (2012)
Perchet, V., Rigollet, P., Chassang, S., Snowberg, E.: Batched bandit problems. Ann. Stat. 44(2), 660–681 (2016)
Kolnogorov, A.: Gaussian two-armed bandit and optimization of batch data processing. Prob. Inf. Trans. 54, 84–100 (2018)
Gao, Z., Han, Y., Ren, Z., Zhou, Z.: Batched multi-armed bandits problem. In: NeurIPS (2019)
Garbar, S.: Invariant description of UCB strategy for multi-armed bandits for batch processing scenario. In: 2020 24th International Conference on Circuits, Systems, Communications and Computers (CSCC), pp. 75–78 (2020)
Garbar, S.: Invariant description for batch version of UCB strategy for multi-armed bandit. J. Phys.: Conf. Ser. 1658, 012015 (2020)
Kolnogorov, A.V., Nazin, A.V., Shiyan, D.N.: Two-armed bandit problem and batch version of the mirror descent algorithm. Autom. Remote Control 83, 1288–1307 (2022)
Vogel, W.: An asymptotic minimax theorem for the two-armed bandit problem. Ann. Math. Statist. 31, 444–451 (1960)
Lai, T.L.: Adaptive treatment allocation and the multi-armed bandit problem. Ann. Stat. 25, 1091–1114 (1987)
Garbar, S.: Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits. ar**v:2112.06423 (2021)
Garbar, S.: Dependency of regret on accuracy of variance estimation for different versions of UCB strategy for Gaussian multi-armed bandits. J. Phys.: Conf. Ser. 2052, 012013 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Garbar, S. (2023). Estimation of Both Unknown Parameters in Gaussian Multi-armed Bandit for Batch Processing Scenario. In: Khachay, M., Kochetov, Y., Eremeev, A., Khamisov, O., Mazalov, V., Pardalos, P. (eds) Mathematical Optimization Theory and Operations Research: Recent Trends. MOTOR 2023. Communications in Computer and Information Science, vol 1881. Springer, Cham. https://doi.org/10.1007/978-3-031-43257-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-43257-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43256-9
Online ISBN: 978-3-031-43257-6
eBook Packages: Computer ScienceComputer Science (R0)