分布 Bernoulli distribution Poisson distribution Exponential distribution 策略 Gittin’s Bayesian optimal strategy for binary rewards [1] The classical UCB policy [2] The UCB-V policy [3] The KL-UCB policy [4] The Clopper-Pearson policy for binary rewards [4] The MOSS policy [5] The DMED policy [6] The Emipirical Likelihood UCB [7] The Bayes-UCB policy [8] The Thompson sampling policy [9] [1] Bandit Processes and Dynamic Allocation Indices J. C. Gittins. Journal of the Royal Statistical Society. Series B (Methodological) Vol. 41, No. 2. 1979 pp. 148–177 [2] Finite-time analysis of the multiarmed bandit problem Peter Auer, Nicolò Cesa-Bianchi and Paul Fischer. Machine Learning 47 2002 pp.235-256 [3] Exploration-exploitation trade-off using variance estimates in multi-armed bandits J.-Y. Audibert, R. Munos, Cs. Szepesvár……