Expected Motion in 2x2 Symmetric Games Played by Reinforcement Learners

The figure shows the expected motion of a system where two players using the Bush–Mosteller reinforcement learning algorithm play a symmetric game. (for temptation) is the payoff a defector gets when the other player cooperates; (for reward) is the payoff obtained by both players when they both cooperate; both players obtain a payoff of (for punishment) when they both defect; and finally, (for sucker) is the payoff a cooperator gets when the other player defects. Parameter denotes both players' aspiration threshold, and is their learning rate. Noise is the probability that a player undertakes the opposite action she or he intended. The arrows represent the expected motion at various states of the system. The background is colored using the norm of the expected motion.

