Estimation of Both Unknown Parameters in Gaussian Multi-armed Bandit for Batch Processing Scenario

Garbar, Sergey

doi:10.1007/978-3-031-43257-6_7

Sergey Garbar ORCID: orcid.org/0000-0002-5205-5252¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1881))

Included in the following conference series:

International Conference on Mathematical Optimization Theory and Operations Research

209 Accesses

Abstract

We consider a Gaussian multi-armed bandit problem with both reward means and variances unknown. A Gaussian multi-armed bandit is considered because in case of batch processing the cumulative rewards for the batches are distributed close to normally. A batch version of the UCB strategy is proposed. Strategy’s description that is invariant in regards to the horizon size is obtained. We consider different approaches to the task of estimating unknown variances of rewards and study their effect on the normalized regret. A set of Monte-Carlo simulations is performed to study the batch strategy and illustrate the results for the two-armed bandit.

Supported by Russian Science Foundation, project number 23-21-00447, https://rscf.ru/en/project/23-21-00447/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Thailand)

eBook: EUR 64.19; Price includes VAT (Thailand)

Softcover Book: EUR 74.99; Price excludes VAT (Thailand)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Gaussian Two-Armed Bandit: Limiting Description

Article 01 July 2020

Gaussian Two-Armed Bandit and Optimization of Batch Data Processing

Article 01 January 2018

Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm

Article 01 August 2022

References

Lattimore, T., Szepesvari, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2020)
Book MATH Google Scholar
Sragovich, V.: Mathematical Theory of Adaptive Control. World Scientific, Singapore (2006)
MATH Google Scholar
Tsetlin, M.: Automaton Theory and Modeling of Biological Systems. Academic Press, New York (1973)
MATH Google Scholar
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397–422 (2002)
MathSciNet MATH Google Scholar
Lugosi, G., Cesa-Bianchi, N.: Prediction, Learning and Game. University Press, New York (2006)
MATH Google Scholar
Berry, D., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London (1985)
Book MATH Google Scholar
Gittins, J.: Multi-armed bandit allocation indices. In: Wiley-Interscience Series in Systems and Optimization. John Wiley & Sons, Ltd., Chichester (1989)
Google Scholar
Zhang, D., Lu, J.: Batch-mode computational advertising based on modern portfolio theory. In: Azzopardi, L., et al. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 380–383. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04417-5_44
Chapter Google Scholar
Kolnogorov, A.V.: Parallel design of robust control in the stochastic environment (the two-armed bandit problem). Autom. Remote Control 73, 689–701 (2012)
Article MathSciNet MATH Google Scholar
Perchet, V., Rigollet, P., Chassang, S., Snowberg, E.: Batched bandit problems. Ann. Stat. 44(2), 660–681 (2016)
Article MathSciNet MATH Google Scholar
Kolnogorov, A.: Gaussian two-armed bandit and optimization of batch data processing. Prob. Inf. Trans. 54, 84–100 (2018)
Article MathSciNet MATH Google Scholar
Gao, Z., Han, Y., Ren, Z., Zhou, Z.: Batched multi-armed bandits problem. In: NeurIPS (2019)
Google Scholar
Garbar, S.: Invariant description of UCB strategy for multi-armed bandits for batch processing scenario. In: 2020 24th International Conference on Circuits, Systems, Communications and Computers (CSCC), pp. 75–78 (2020)
Google Scholar
Garbar, S.: Invariant description for batch version of UCB strategy for multi-armed bandit. J. Phys.: Conf. Ser. 1658, 012015 (2020)
Google Scholar
Kolnogorov, A.V., Nazin, A.V., Shiyan, D.N.: Two-armed bandit problem and batch version of the mirror descent algorithm. Autom. Remote Control 83, 1288–1307 (2022)
Article MathSciNet MATH Google Scholar
Vogel, W.: An asymptotic minimax theorem for the two-armed bandit problem. Ann. Math. Statist. 31, 444–451 (1960)
Article MathSciNet MATH Google Scholar
Lai, T.L.: Adaptive treatment allocation and the multi-armed bandit problem. Ann. Stat. 25, 1091–1114 (1987)
MathSciNet MATH Google Scholar
Garbar, S.: Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits. ar**v:2112.06423 (2021)
Garbar, S.: Dependency of regret on accuracy of variance estimation for different versions of UCB strategy for Gaussian multi-armed bandits. J. Phys.: Conf. Ser. 2052, 012013 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Yaroslav-the-Wise Novgorod State University, Velikiy Novgorod, 173003, Russia
Sergey Garbar

Authors

Sergey Garbar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey Garbar .

Editor information

Editors and Affiliations

Krasovsky Institute of Mathematics and Mechanics, Ekaterinburg, Russia
Michael Khachay
Sobolev Institute of Mathematics, Novosibirsk, Russia
Yury Kochetov
Sobolev Institute of Mathematics, Omsk, Russia
Anton Eremeev
Melentiev Energy Systems Institute, Irkutsk, Russia
Oleg Khamisov
Institute of Applied Mathematical Research, Petrozavodsk, Russia
Vladimir Mazalov
University of Florida, Gainesville, FL, USA
Panos Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garbar, S. (2023). Estimation of Both Unknown Parameters in Gaussian Multi-armed Bandit for Batch Processing Scenario. In: Khachay, M., Kochetov, Y., Eremeev, A., Khamisov, O., Mazalov, V., Pardalos, P. (eds) Mathematical Optimization Theory and Operations Research: Recent Trends. MOTOR 2023. Communications in Computer and Information Science, vol 1881. Springer, Cham. https://doi.org/10.1007/978-3-031-43257-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-43257-6_7
Published: 21 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43256-9
Online ISBN: 978-3-031-43257-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Estimation of Both Unknown Parameters in Gaussian Multi-armed Bandit for Batch Processing Scenario

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Gaussian Two-Armed Bandit: Limiting Description

Gaussian Two-Armed Bandit and Optimization of Batch Data Processing

Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Estimation of Both Unknown Parameters in Gaussian Multi-armed Bandit for Batch Processing Scenario

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Gaussian Two-Armed Bandit: Limiting Description

Gaussian Two-Armed Bandit and Optimization of Batch Data Processing

Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation