WOLFRAM NOTEBOOK

WOLFRAM|DEMONSTRATIONS PROJECT

The Multi-Arm Bandit

create new problem
success probabilities:
default
random
discount factor
0.5
0.9
0.99
reset current problem
show success probabilities
You are given two biased coins, but you do not know the bias of either coin. There is a reward each time you successfully get heads, with the amount of the reward decreasing with each coin flip. This Demonstration shows the outcomes of a series of flips, each made by selecting a coin with one of the two choice buttons. The outcomes determine the total reward and provide information on the biases of the two coins, which may guide future choices. A good coin-picking strategy accumulates the greatest reward by balancing the need to explore which coin is more likely to succeed with exploiting the coin that seems best so far.
Wolfram Cloud

You are using a browser not supported by the Wolfram Cloud

Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.


I understand and wish to continue anyway »

You are using a browser not supported by the Wolfram Cloud. Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.