Navanaxinermis,asimultaneoushermaphrodite.SeeLeonardandLukowiak(1985)fortheevolutionarygametheory.Picturescanbefoundat

Abstract
While evolutionary game theory has focused on tractable solutions with explicit dynamics imposed on a population, many interesting dynamics emerge in the simulation of even simple games. In particular, games in which there are only mixed strategy equilibria, or no ESS, still have interesting evolutionary dynamics - including the evolution of cheap-talk signalling, punctuated evolution and extinction events even in the absence of any exogenous factors. This project seeks to consolidate and extend materials to allow analysis of symmetric repeated evolutionary games played by deterministic finite automata.

Background

Evolutionary Algorithms and Games

Evolutionary analogies in computing science, such as genetic algorithms, evolutionary programming and genetic programming view evolution as exploring a given objective function, purporting to use implicit parallelism and other methods to avoid local optima which greedy approaches like gradient descent fall into. A good deal of natural evolution is indeed of this form.

Evolutionary games played inter - and con - specifically are defined by the property that fitness is determined in part by the behaviour of other evolving creatures. This may make population evolution highly path dependent. Indeed, for many evolutionary games the question is not so much “what is the evolutionary stable strategy (“ESS,” Maynard Smith and Price (1976), Maynard Smith (1982))?” but whether there is any path to that ESS, and if not, what then can we say about “evolution?”

The literature on these games falls into three categories a) posing games and proving existence of an ESS, b) analytically solvable environments where proportions of players of one type or another follow a known time evolution (this operates only at phenotype level, which in a game is the strategy used to play the game), and c) computer simulations over many generations which are commented upon. What the literature lacks, then, is a comprehensive visualisation of evolutionary pathways. An excellent Mathematica-based survey of evolutionary algorithms and some evolutionary dynamics can be found in Jacob (2001).

We will develop a framework to analyse the potential evolutionary dynamics of games played between finite state machines encoded as bit strings. The encoding allows us to model mutation and recombination which operate at genotype level.

The Simulation Set up

The centrepiece is a repeated 2x2 matrix with payoffs to each player interpreted broadly as increments in fitness. In a population, players are pitted against one another and accumulate fitness over all scheduled interactions (often they play each other once in the repeated stage game). Harrald (1995) proposed this framework using deterministic finite automata to play repeated versions of this game encoded as bit strings with evolution driven by roulette-wheel selection, single segment crossover and bit flip mutation. This framework was picked up in Cerruti et al (2005) and a good deal else later.

At its heart players meet and make one of two choices in a “stage” game, say X or Y, and the payoff - increment in fitness, depends on which of the four possibilities arise. There are many famous stage games, which between them capture the essential nature of rivalrous, cooperative complementary and other fundamental interactions (payoff is to {row, column} player, with X in the first row and second row being the row player, and first and second column, X and Y for the column player):

prisonersDilemma=(

{1,1}	{5,0}
{0,5}	{3,3}

);battleSexes=(

{10,5}	{0,0}
{0,10}	{5,10}

);stagHunt=(

{10,10}	{0,7}
{7,0}	{7,7}

);coordination=(

{10,10}	{0,0}
{0,0}	{5,5}

);

For notation, we will use P(a,b) = as the increment in payoff to a player playing a against a player b in the stage game. In iterated versions of these games, there is an unfolding history and a design decided is how much of such history can be used, explored in a Wolfram demonstrations project by Seth Chandler (Chandler, 2011 ). Chandler, a Law Professor at University of Houston, is responsible for many interesting Mathematica projects and applications across economics, law and evolution, climate. He was a Wolfram Innovator Award winner.

Famously, set-ups similar to this have been used to explain the apparently unselfish-gene like cooperative behaviour in Nature when uncooperative behaviour is a dominant strategy, for example in the iterated Prisoner’s dilemma, from Axelrod (1984) et seq ad nauseam. In fact, cooperation is highly unstable and requires very specific dynamics (Harrald and Morrison, 2003) and provably cannot emerge in supergames (Rubinstein, 1986).

The DFAs

We will use machines with S states, encoded as strings, where each of the S states specifies a move, X or Y, for the stage game, and the next states entered dependent on the two possible moves by the other player. Thus a typical state is “a b c”, with a is X or Y, the move for the current iteration, in {X,Y} and b and c in {1,...,S} the next state entered if the other player played X or Y respectively. DFAs are assumed to start in state 1.

Thus there a

)

possible encodings (but some will be observationally equivalent machines). For 2 state machines and two choices per stage game, this is 64, for three state machines it is 5,832 and for four state machines it is already 1,048,576... Clearly, we need a way to “sample” this, and in fact this explosion does indicate the importance role of recombination and mutation in exploring a vast space.

As an example, consider the prisoner’s dilemma above, where P(Y,Y) = 3 is Pareto superior to the Nash equilibrium and ESS play of (X,X) with P(X,X)=1 - the famous tit-for-tat is encoded as {C,1,2,D,1,2}.

Game Loop

We first compute the total; payoff over t iterations of the stage game for player i against player j, where the index runs across all possible automata. In the 2-state automaton case, this is a 64x64 matrix with cumulative stage game payoffs stored when row i plays column j. (This precomputation is only feasible for smaller numbers of states.)If t is large enough the players will enter a repeated loop (no later than

iterations).

Let representative element be IP(i,j). Of considerable interest (and thanks to Christopher Wolfram for noting this) is the directed graph representation of a matrix of 0’s and 1’s indicating if IP(i,j) > IP(j,i). There is no guarantee of transitivity here, DFA type i may “beat” a type j, and j beat k, and k beat i. This in fact is a key to understating some evolutionary dynamics, whereby a dominant i population may replace a j population for only as long as there are below a threshold number of k. See Harrald and Morrison (2003). The IP(i,j) are calculated once, and thereafter the simulation operates only on the number of each DFA type in the population, call this N(i). So, over an entire run of the game the cumulative payoff to a representative player of any player of DFA type i is CP(x) =

∑

j=1

IP(i,j)N(j)

where j runs across the possible strings (and so Z =

)

) and x indexes a player (not the DFA type). (Entirely for convenience, in this simulation, we assume each player plays a facsimile of themselves.)It is worth noting, the only reason we can per-compute the payoffs is the manageable size of the space of strings. if we were to increase the number of states this may be infeasible. To find canonical machine, it is likely we can map from an unfolding history to the next action more efficiently.

Selection and Mutation

Both choices for these processes are pragmatic among myriad choices.

It is convenient to operate only at the level of the N(j). For selection, we pick a rate r, a good proportion of the total number of players (say, more than 50%), and r times we select two players, a and b, at random (note, this is players not unique DFA types); suppose the first player is of DFA type i and the second of DFA type j, and further that CP(a) > CP(b); then we add 1 to N(i) and subtract 1 from N(j).

For mutation, we select a mutation rate, m, which is “small” (less than 5% say), we select players at random with probability m,and randomly reassign these players to any other DFA type with equal probability. Note, for DFA type j, we expect to mutate m N(j) away to other types, but to create m N / Z from other mutations where N is the total number of players. Mutation differs in one important respect from selection: non-existent DFA strings (i.e., N(j) = 0) can be created.

Summary

The simulation, then runs as follows (with inputs highlighted in red):

Create payoffs in stage game, P(X,X), P(X,Y), P(Y,X), P(Y,Y)

Create all possible S-state automata as strings

Play each unique DFA (string) against all others, in t iterations of the stage game, creating IP(i,j) payoffs.

Allocate numbers of players to each string type, N(i)

For g generations
Calculate CP(x) for all players
Save output
Operate Selection on CP(x) r times
Operate Mutation on the N(j) at rate m
Next generation

Example results

Solving any symmetric stage game

After Jacob (2001), the stage game can be “solved” for Nash equilibrium and ESS. Briefly, a Nash equilibrium obtains when, given solely the choices the stage game, neither player can increase P( ) holding constant the other player’s strategy. For example, in prisonersDilemma, a play of X is a best response to a play of X, thus (X,X) is a Nash equilibrium. In fact, by design, a play of X is also a best response to Y. coordination has two Nash equilibria.

An ESS is interesting: it states that if the vast majority of players adopt an ESS, no small minority of players can obtain on average a higher payoff (fitness). We define an ESS, s

E(s,s) > E(T,s), or
E(s,s) = E(T,s) and E(s,T) > E(T,T)

where T is any other available “mutant” strategy. The first condition is stronger than Nash equilibrium. Maynard Smith (1982) addressed some simply dynamics to ESS in his famous ternary diagrams. X is an ESS in prisonersDilemma, as are either Nash equilibria in coordination. A game with a Nash equilibrium but no ESS is harmThyNeighbour:

harmThyNeighbour=(

{2,2}	{1,2}
{2,1}	{2,2}

)

I simply have not had time to address mixed strategies (random choice) which are important in games of coordination, for one example, or generally when there is no pure equilibrium.

A package is provided to find these for any given matrix payoff input - the algorithm is due to Haigh (1975). An excellent framework for solving non-evolutionary games can be found in Ungureanu (2018). The output is in probabilities (of playing X).

In[]:=

A={{1,5},{0,3}};B={{1,2},{2,2}};NashEquilibria[A]ESState[A]ESStrategy[A]NashEquilibria[B]ESState[B]ESStrategy[B]

Out[]=

{}

Out[]=

Evolutionary dynamics

Referring back to the pseudocode, we have a manipulator to perform the evolutionary algorithm as follows:

The question becomes - what do we look for? First we look for stability - does the algorithm converge on a single or a few DFA type? Oscillations are what is referred to as “punctuated evolution,” a point of contention between the main protagonists n the field, but perfectly possible in a game environment. Below we see a simulation of the Prisoner’s dilemma for 2-stat DFA that inexorably leads to permanent “defection”, over this run the mutation rate is insufficient generate both tit-for-tat and pathological cooperators...

Second, if not, and two or more DFAs coexist, do they have interpretations as different species? If we imposed speciation (that is, selection takes place on identifiable subsets) does evolution stabilise? (This is to be implemented.) Very interestingly, while the surviving automata in this simulation are observationally equivalent - playing X, their is “genotype” diversity:

In “coordination games” we can see the emergence of what could easily be thought of as speciation or biological gender, in so far as disti9nct genotypes and phenotypes learn for example to play the off diagonal:

Third, what is happening to average fitness? This is critical; for example, if as an equilibrium is approached, average fitness is decreasing, then the population may have an endogenous extinction. For the same simulation as above, the average payoff accruing to any player inexorably decreases from a purely average value, see below.

For this set up, we would conclude that if there is a minimum life- sustainable payoff, for example, 1.4, then this population endogenously progresses to extinction.

Below here is source code


Concluding Remarks

We have created a preliminary framework for modelling evolution in games played by deterministic automata, and even this simple framework readily generates the famous phenomena of punctuated equilibrium and extinction events in the absence of external events.

However, much more work is required to allow a facile analysis of my primary interests - the development of endogenous coordination mechanisms via evolution, which includes speciation itself, biological gender roles, and indeed language. This is future work requiring us to allow mixed strategies via probabilistic finite automata and pre-play signalling.

Acknowledgments

I would like to thank my mentors, Christopher Wolfram and Bob Nachbar for their help and tolerance of my absence and peripatetic distractions.

References

R. Axelrod (1984), The Evolution of Cooperation, New York: Basic Books,
1984. (https://ee.stanford.edu/~hellman/Breakthrough/book/pdfs/axelrod.pdf )

U. Cerruti, M. Giacobini and U. Merlone (2005), “A New Framework to Analyse Evolutionary Symmetric Games,” Proceedings of the 2005 IEEE Symposium on Computational Intelligence and Games, Essex University, Colchester, Essex, UK. (https://www.researchgate.net/publication/221157533_A_New_Framework_to_Analyze_Evolutionary_2_x_2_Symmetric_Games)

S. Chandler (2011), “Iterated Games,” Wolfram Demonstrations Project, https://demonstrations.wolfram.com/IteratedGames/

J. Haigh (1975), “Game theory and evolution,” Adv. Appl. Prob. 7, 8–11.

Maynard Smith, J. (1974) The theory of games and the evolution of animal conflicts. J. Theoret. Biol. 47, 209–221.

P. Harrald (1995), “Evolving behaviour in repeated 2-player games,” in Practical Handbook of Genetic algorithms: Applications, Volume I, LanceChambers, Ed., pp. 459–496. CRC Press, Boca Raton, FL.

P. Harrald and W. Morrison (2003), “Sleepers, Thresholds, and Gateways: Drift and Stability in the Ultimatum Game and the Prisoners’ Dilemma.” ( https://ssrn.com/abstract=3140046 or http://dx.doi.org/10.2139/ssrn.3140046 )

C. Jacob (2001), Illustrating Evolutionary Computation with Mathematica, Morgan Kaufmann (Feb. 2001).

J. Leonard and K. Lukowiak (1985), “Courtship, copulation, and sperm trading in the sea slug, Navanax inermis (Opisthobranchia: Cephalaspidea),” Canadian Journal of Zoology, Vol 53, Number 12, December 1985. (https://cdnsciencepub.com/doi/10.1139/z85-406 ).

J. Maynard Smith (1982), Evolution and the Theory of Games, Cambridge University Press, December 1982.

A. Rubinstein (1986), “Finite automata play the repeated prisoner’s dilemma,” Journal of Economic Theory, Volume 39, Issue 1, June 1986. (https://www.sciencedirect.com/science/article/abs/pii/0022053186900219 )

V. Ungureanu (2018), “[WSS18] A Game Theory Package for the Wolfram Language,” (https://community.wolfram.com/groups/-/m/t/1378496 )

Cite This Notebook

“Evolving finite automata that play evolutionary games”
by Paul Harrald
https://community.wolfram.com/groups/-/m/t/3212029