The Evolution of Cooperation — Robert Axelrod on the Iterated Prisoner’s Dilemma

The Evolution of Cooperation - Robert Axelrod - Prisoner's Dilemma

Cooperation and the prisoner’s dilemma

The prisoner’s dilemma is a classic problem in game theory. In its simplest form, there are two players with two choices: cooperate or defect. What’s more, players must choose to cooperate or defect without knowing what the other player will do. We can summarize their choices and respective payoffs in a 2×2 table:

Prisoner’s dilemma (choices and payoffs)
Payoffs (row, col)1. Cooperate2. Defect
1. Cooperate3,30,5
2. Defect5,01,1

This game has three types of outcomes. Firstly, if both players cooperate, each player receives a payoff of 3 respectively. However, if one player defects while the other cooperates, the defector receives a payoff of 5, while the cooperator receives a payoff of zero. Finally, if both players defect, each receives a payoff of 1.

The prisoner’s dilemma exists because defection is more tempting than cooperation. If the other player is cooperating, you can earn a higher payoff by defecting. And if the other player is defecting, you also want to defect. Put another way, in a single shot game with two self-interested players, both players have a dominant strategy to defect. The dilemma arises because the joint payoff from mutual defection is inferior to that of mutual cooperation.

Indeed, the prisoner’s dilemma occurs in many forms. Price wars, nuclear arms races, and the global response to climate change can at times resemble that of a prisoner’s dilemma. The motivational structure of a prisoner’s dilemma can seem inescapable. Under what conditions might we expect coordination and cooperation to flourish? That is the central question of Robert Axelrod’s wonderful book, The Evolution of Cooperation.

Skip ahead

Iterations, indefiniteness, and preemption

The setup above, Axelrod notes, assumes that players face each other in the prisoner’s dilemma once or a finite number of times. In either case, both players will want to defect. After all, both players know that in the single or final game of the prisoner’s dilemma, the other player will defect. In trying to pre-empt one another with their own defections, the game degenerates into one of mutual defection.

The possibilities, however, can open up if the interactions are ongoing or indefinite. In this scenario, players now have to consider the impact of their choices today on their expected future payoffs. If they know that they might have to play the prisoner’s dilemma again someday, they might think twice before defecting today. And if their opponent reasons along similar lines, then there appears a way to circumvent the dilemma.

Put another way, it seems then that mutual cooperation is possible if the future benefit of cooperation outweighs the immediate gains of defection today. This depends in part on the players’ discount rate and the likelihood that they’ll meet each other again in the future.

An eye for an eye

What then is the most optimal strategy for a player in an iterated prisoner’s dilemma? It seems unwise to always defect, given the lifetime benefits of cooperation. But it also seems risky to cooperate if your opponent decides to defect.

Indeed, through simulations and experiments (round-robin competitions with other game theorists), Axelrod could not find an optimal strategy. The viability of any strategy, he shows, depended too much on the personalities and strategies of everybody else. For instance, if everybody else is defecting, then your best course of action is to do the same.

That said, one strategy tended to fare better than others over the long-run given the possibilities. This was the strategy of tit-for-tat, where players cooperate on their first move and match the choices of their competitors thereafter. For example, if the other player cooperates (defects) on their current turn, the tit-for-tat player will cooperate (defect) on his/her next turn.

Niceness, retaliatory, and forgiveness

While a simple strategy, tit-for-tat possesses four properties that makes it resilient: (1) niceness, (2) retaliatory, (3) forgiveness and (4) clarity.

Firstly, tit-for-tat players are never the first to defect. Their “niceness” avoids unnecessary conflict. So, when two nice players meet, sustainable cooperation can emerge. Secondly, when “mean” defectors try to exploit them, the retaliatory nature of tit-for-tat punishes them (discouraging such behavior). The forgiveness aspect of tit-for-tat, however, provides the opportunity for two defecting players to restore cooperation. Without it, they can remain in an endless loop of mutual defection.

Finally, the simplicity and clarity of tit-for-tat’s rules helps others to learn, adapt and imitate. There is a tendency, I think, for people to get too cute or clever with their strategies. When your strategy gets too complicated, it’s difficult for other players to understand and respond to your choices. In the iterated prisoner’s dilemma, where cooperation is desirable, sophisticated rules and unclear signals make it hard to reciprocate.

Learning, switching, and initial viability

In iterated games, people, companies and nations will try a variety of strategies. Players with successful strategies will persist, flourish, and/or grow. Those with less successful strategies may learn from and/or switch to more successful ones. And those that fail to learn or adapt will perish. In either case, the distribution of strategies and players are changing in an adaptive, reflexive and selective process.

Like an evolutionary system, good strategies will proliferate while the poor strategies fall off. It’s important to remember, as Axelrod puts it, that: “the effectiveness of a particular strategy depends not only on its own characteristics, but also on the nature of the other strategies with which it must interact”. The strategy that is viable initially can become outmoded as the distribution of players and strategies change.

The ecology of meanies

Consider a pool of players that play either a cooperate-only, defect-only, or tit-for-tat strategy in an iterated prisoner’s dilemma. Defect-only players may succeed in early rounds of the iterated game as they exploit cooperate-only players. But when the exploitable fall off, defect-only players will struggle.

And where only defect-only and tit-for-tat players exist, defect-only players will underperform as tit-for-tat players cooperate amongst themselves. In an iterated prisoner’s dilemma, strategies that don’t play well with themselves (e.g. exploitative strategies) are “eventually a self-defeating process”.

Twins, clustering, and stability

Another strength of tit-for-tat is its viability in many environments. Tit-for-tat plays well with cooperating players and other tit-for-tat players (its twin). It is also reflexive to exploitative players. So, while tit-for-tat is not always optimal, it appears sufficiently reliable and adaptive.

A world of non-cooperating individuals, however, is collectively stable. That is, there is no incentive for any single individual to cooperate when nobody is cooperating. However, if a small cluster of tit-for-tat players emerges and interacts with one another, then it’s possible for mutual cooperation to proliferate across a world of non-cooperation.

This explains in part why extractive institutions and corrupt regimes try to prevent the proliferation and clustering of ideas and people that threaten their political and economic power. The distinction between singular and clustered strategies have important consequences for the evolution of cooperation in social systems.

Friendship, foresight and feedback

Friendship is not always necessary for mutual cooperation to flourish. Axelrod illustrates this with the ‘live-and-let-live’ systems that emerged between combatants during the First World War. The phenomenon often began with an unspoken agreement between soldiers to cease fire under specific conditions (e.g., don’t shoot during bad weather or Christmas).

Over time, reciprocation led to mutually accepted modes of behavior, which then spread into other domains of trench warfare. Soldiers in a prolonged conflict with one another realized that mutual restraint was more beneficial to self-preservation. Friendship in this instance wasn’t necessary for cooperation to emerge.

Foresight is not necessary for the emergence, evolution and stability of cooperation either. In biology, altruism is an emergent property that can help a pool of closely related genes to propagate from one generation. Social insects, like ants and bees, are fascinating examples of this. We also see cooperation in environments where genetic relatedness is low. Axelrod highlights, for example, the symbioses between fungus and alga, ants and ant-acacias, and fig wasps and fig trees.

Where benefits to cooperation exist, reciprocity can emerge. This is true, both in biology and economics. The distinction between social and biological cooperation then is in our “deliberate rather than blind adaptation”. While foresight isn’t necessary, it might help to speed things up.

Echo effects

Axelrod highlights also the impact of echo effects or feedback loops in interacting social systems. “Not only [do] preferences affect behaviour and outcomes, but behaviour and outcomes also [affect] preferences”. This has important consequences for the iterated prisoner’s dilemma. That is, cooperation (defection) can encourage and intensify future cooperation (defection).

To maintain stable cooperation, players need to recognize one another. Mutualism emerges in ecosystems because organisms find ways to discriminate between their threats and partners. In social systems, we have mechanisms like reputations, brands, rituals, ethics, regulation, code of conducts, and territories to perform such functions.

Green-eyed monster

People sometimes think about problems in business and policy as a zero-sum game. But this is often not the case. Mutual cooperation is less likely when people focus on relative results. Comparing yourself to others “risks the development of self-destructive envy”. Axelrod observes that when two players seek to grow their reputations, it’s not surprising to see them “spiral downward into a long series of mutual punishments”.

Think about two businesses that compete for market leadership at all costs. Some combination of envy, single-level thinking and transitory relationships may explain why they engage in self-destructive pricing, marketing or capital projects. To generalise Axelrod’s point, I think it’s about framing your problem, choices and payoffs in the right way .

“Well, envy and jealousy made, what, two out of the Ten Commandments? I’ve heard Warren say a half a dozen times, ‘It’s not greed that drives the world, but envy.’”

Charlie Munger

Three levels deep and small-world models

To foster mutual cooperation, Axelrod says we have to: (1) “enlarge the shadow of the future” (i.e., the durability, frequency, and recognizability of future interactions); (2) “change the payoffs”; and (3) “teach the players to care about each other”.

The author also recommends that “we go at least three levels deep” when analyzing games like the iterated prisoner’s dilemma. This includes: (1) the direct impact of our choice; (2) the potential response of other players; and (3) our response to their potential responses (and the cascades from there). If you analyze your choices as single-step problems, you’ll miss the consequences of echo effects and feedback loop altogether.

Finally, given the simplicity, clarity and applicability of Axelrod’s arguments, it’s tempting to generalize the iterated prisoner’s dilemma to every social problems. We have to remember, however, that this is still a small-world model with fixed assumptions. It’s not always clear either if the issue we face is an iterated prisoner’s dilemma. The payoffs, choices, timeframes and discount parameters are often uncertain. Problem definition is critical here.

References

  • Robert Axelrod. 1984. The Evolution of Cooperation. Selected papers from Axelrod, available at: <http://www-personal.umich.edu/~axe/>