WOLFRAM|DEMONSTRATIONS PROJECT

The Multi-Arm Bandit

​
create new problem
success probabilities:
default
random
discount factor
0.5
0.9
0.99
reset current problem
show success probabilities
observations
success probability estimation
reward
success
failure
You are given two biased coins, but you do not know the bias of either coin. There is a reward each time you successfully get heads, with the amount of the reward decreasing with each coin flip. This Demonstration shows the outcomes of a series of flips, each made by selecting a coin with one of the two choice buttons. The outcomes determine the total reward and provide information on the biases of the two coins, which may guide future choices. A good coin-picking strategy accumulates the greatest reward by balancing the need to explore which coin is more likely to succeed with exploiting the coin that seems best so far.