WOLFRAM|DEMONSTRATIONS PROJECT

Expected Motion in 2x2 Symmetric Games Played by Reinforcement Learners

​
payoffs
T
4
R
3
P
1
S
0
payoff matrix
C
D
C
3 , 3
0 , 4
D
4 , 0
1 , 1
aspiration threshold
A
2
learning rate
l
0.5
trembling hands noise
noise
0
arrows
show
many
fewer
none
The figure shows the expected motion of a system where two players using the Bush–Mosteller reinforcement learning algorithm play a symmetric
2×2
game.
T
(for temptation) is the payoff a defector gets when the other player cooperates;
R
(for reward) is the payoff obtained by both players when they both cooperate; both players obtain a payoff of
P
(for punishment) when they both defect; and finally,
S
(for sucker) is the payoff a cooperator gets when the other player defects. Parameter
A
denotes both players' aspiration threshold, and
l
is their learning rate. Noise is the probability that a player undertakes the opposite action she or he intended. The arrows represent the expected motion at various states of the system. The background is colored using the norm of the expected motion.