pymaBandit说明
分布
- Bernoulli distribution
- Poisson distribution
- Exponential distribution
策略
- Gittin’s Bayesian optimal strategy for binary rewards [1]
- The classical UCB policy [2]
- The UCB-V policy [3]
- The KL-UCB policy [4]
- The Clopper-Pearson policy for binary rewards [4]
- The MOSS policy [5]
- The DMED policy [6]
- The Emipirical Likelihood UCB [7]
- The Bayes-UCB policy [8]
- The Thompson sampling policy [9]
[1] Bandit Processes and Dynamic Allocation Indices J. C. Gittins. Journal of the Royal Statistical Society. Series B (Methodological) Vol. 41, No. 2. 1979 pp. 148–177
[2] Finite-time analysis of the multiarmed bandit problem Peter Auer, Nicolò Cesa-Bianchi and Paul Fischer. Machine Learning 47 2002 pp.235-256
[3] Exploration-exploitation trade-off using variance estimates in multi-armed bandits J.-Y. Audibert, R. Munos, Cs. Szepesvári. Theoretical Computer Science Volume 410 Issue 19 Apr. 2009 pp. 1876-1902
[4] The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond A. Garivier, O. Cappé JMLR Workshop and Conference Proceedings Volume 19: COLT 2011 Conference on Learning Theory pp. 359-376 Jul. 2011
[5] Minimax Policies for Adversarial and Stochastic Bandits J-Y. Audibert and S. Bubeck. Proceedings of the 22nd Annual Conference on Learning Theory 2009
[6] An asymptotically optimal policy for finite support models in the multiarmed bandit problem J. Honda, A. Takemura. Machine Learning 85(3) 2011 pp. 361-391
[7] UCB policies based on Kullback-Leibler divergence O. Cappé, A. Garivier, O-A. Maillard, R. Munos. in preparation
[8] On Bayesian Upper Confidence Bounds for Bandit Problems E. Kaufmann, O. Cappé, A. Garivier. Fifteenth International Conference on Artificial Intelligence and Statistics (AISTAT) Apr. 2012
[9] Thompson Sampling: an optimal analysis E. Kaufmann, N. Korda et R. Munos. preprint
[10] On Upper-Confidence Bound Policies for Non-stationary Bandit Problems A. Garivier, E. Moulines. Algorithmic Learning Theory (ALT) pp. 174-188 Oct. 2011
[11] Optimism in Reinforcement Learning and Kullback-Leibler Divergence S. Filippi, O. Cappé, A. Garivier. 48th Annual Allerton Conference on Communication, Control, and Computing Sep. 2010
[12] Parametric Bandits: The Generalized Linear Case S. Filippi, O. Cappé, A. Garivier, C. Szepesvari. Neural Information Processing Systems (NIPS) Dec. 2010
- 原文作者:mlyixi
- 原文链接:https://mlyixi.github.io/post/ml/pymaBandit%E8%AF%B4%E6%98%8E/
- 版权声明:本作品采用知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议进行许可,非商业转载请注明出处(作者,原文链接),商业转载请联系作者获得授权。