Prisoner’s Dilemma Simulator

Explore one of the most famous problems in Game Theory—the Prisoner's Dilemma. Use this interactive simulator to test different strategies, observe how cooperation and betrayal evolve over repeated interactions, and discover why seemingly rational decisions can lead to surprising outcomes.

Prisoner’s Dilemma

Decision-making shapes many aspects of our daily lives, from teamwork and competition to trust and negotiation. Have you ever wondered why individuals sometimes choose not to cooperate, even when working together could benefit everyone? This fascinating situation is explored through the Prisoner’s Dilemma, one of the most well-known concepts in Game Theory.
Imagine you and your friend are two prisoners- you both are independently interrogted. Before the polic you have two choices- you With our interactive simulator, you can test different strategies, analyze outcomes, and observe how choices influence rewards and consequences. Dive into the world of strategy, cooperation, and rational thinking — experiment with the Prisoner’s Dilemma today!

Payoff rules for Prisoner’s Dilemma

Your Move	Opponent's Move	Your Score	Opponent's Score	Outcome
Cooperate	Cooperate	3	3	Both players benefit through cooperation.
Cooperate	Defect	0	5	You are exploited while the opponent gains the maximum reward.
Defect	Cooperate	5	0	You gain the maximum reward by exploiting the opponent.
Defect	Defect	1	1	Both players receive a small punishment payoff.

Legend:
P1 = Prisoner 1 | P2 = Prisoner 2
C = Cooperate | D = Defect

P1	P2	P1 Score	P2 Score
C	C	3	3
C	D	0	5
D	C	5	0
D	D	1	1

Simulator

Dive into the physics of light interference with our interactive Newton’s ring simulator!

Interactive Physics Simulator – Image Formation by Concave Mirror

Share with your friends

Explore Iterative Prisoner's Dilemma

One Question at a time!

Q1. What is the Prisoner's Dilemma and where did it originate?

The Prisoner's Dilemma is a foundational thought experiment in game theory, formalised by mathematician Albert W. Tucker in 1950 at Princeton. Two suspects are arrested and held separately — unable to communicate. Each must independently choose to stay silent (cooperate) or betray the other (defect). The dilemma arises because both players, acting purely rationally in their own self-interest, end up worse off than if they had both cooperated. It is the single most studied model in all of social science.

Q2. What is the payoff matrix of the Prisoner's Dilemma and what does it tell us?

The classic Prisoner's Dilemma is traditionally explained using prison sentences. The four possible outcomes are:

Both cooperate (C,C): each receives a light sentence — the best collective outcome.
Both defect (D,D): each receives a moderate sentence — worse for both players.
One defects while the other cooperates: the defector goes free while the cooperator receives the harshest sentence.

The preference order for each player is: DC > CC > DD > CD. This means exploiting a cooperative opponent is always the most attractive individual choice, while being exploited is the least desirable outcome. This tension between individual incentives and collective benefit is what makes the Prisoner's Dilemma so fascinating.

In this simulator, we replace prison sentences with a point-based scoring system. Higher scores are better, making it easier to compare strategies over hundreds of repeated rounds. The scoring matrix used is shown below:

Your Move	Opponent's Move	Your Score	Opponent's Score
Cooperate	Cooperate	3	3
Cooperate	Defect	0	5
Defect	Cooperate	5	0
Defect	Defect	1	1

Q3. What is a dominant strategy and why does it lead to mutual defection in the Prisoner's Dilemma?

A dominant strategy is one that produces the best outcome for a player regardless of what the other player does. In the Prisoner's Dilemma, defection is dominant for both players. If your partner stays silent — betraying them gets you 0 years vs 3 years for staying silent. If your partner betrays — defecting gets you 5 years vs 10 years for staying silent. Defect wins in both cases. Since both players reason identically, both defect — landing in the (D,D) outcome of 5 years each, even though (C,C) at 3 years each was available.

Q4. What is Nash Equilibrium and why is mutual defection a Nash Equilibrium in the Prisoner's Dilemma?

A Nash Equilibrium (named after mathematician John Nash) is a state where no player can improve their outcome by unilaterally changing their strategy, given what the other player is doing. In the Prisoner's Dilemma, (D,D) is a Nash Equilibrium because: if your opponent is defecting, switching to cooperation makes you worse off (you go from 5 years to 10 years). Neither player has any incentive to deviate. Crucially, this Nash Equilibrium is not Pareto-efficient — both players could be better off at (C,C), but rational self-interest traps them at (D,D).

Q5. What is the Iterated Prisoner's Dilemma and how does it change the game fundamentally?

The Iterated Prisoner's Dilemma (IPD) is the same game played repeatedly between the same two players over multiple rounds. This single change transforms the logic entirely. Because future rounds exist, a player can reward cooperation and punish defection — making long-term cooperation individually rational. The Folk Theorem of repeated games states that any outcome — including full mutual cooperation — can become a Nash Equilibrium in an infinitely repeated game, provided players are sufficiently patient. The IPD is the mathematical proof that cooperation does not require morality; it requires only the expectation of future interaction.

Q6. What strategies are used in the IPD and which have proven most effective?

IPD strategies range from simple to mathematically sophisticated:

Always Defect (AllD): exploits cooperators but triggers retaliation — earns low long-run scores.
Always Cooperate (AllC): easily exploited by defectors.
Tit-for-Tat (TFT): cooperate on round 1, then mirror the opponent's last move exactly — nice, retaliatory, forgiving, and transparent.
Generous TFT: like TFT but occasionally forgives a defection — more robust under noise and accidental errors.
Win-Stay Lose-Shift (Pavlov): repeat last move if rewarded, switch if punished — outperforms TFT in noisy real-world settings.
Zero-Determinant (ZD) strategies (Press & Dyson, 2012): mathematically enforce a fixed linear relationship between both players' scores, regardless of what the opponent does.

Q7. What did Axelrod's famous computer tournament reveal — and why did Tit-for-Tat win?

In 1980, political scientist Robert Axelrod invited game theorists, biologists, economists and computer scientists worldwide to submit IPD strategies for a round-robin tournament. Each pair played 200 rounds; scores were summed across all matchups. The winner — in both the first tournament (14 entries) and the second (62 entries, where everyone knew TFT had won the first) — was Tit-for-Tat, a 4-line FORTRAN program submitted by mathematician Anatol Rapoport. Axelrod identified four properties behind its success: be nice (never defect first), be retaliatory (punish defection immediately), be forgiving (resume cooperation quickly), and be clear (simple enough that opponents can predict your behaviour).

Q8. How does the IPD explain cooperation in biology, economics, and international relations?

The IPD has become a universal model across disciplines:

Biology: Axelrod showed that TFT-like strategies dominate evolutionary simulations — strategies that cooperate thrive and spread, those that defect die out. British and German soldiers in WWI trenches independently developed informal live-and-let-live truces — TFT behaviour emerging spontaneously under repeated interaction.
Economics: competing firms in an oligopoly tacitly keep prices high rather than triggering a price war — IPD logic sustains cooperation without any agreement.
International relations: arms control treaties and trade agreements are stable IPD equilibria — nations cooperate because the cost of triggering retaliation exceeds the gain from cheating.

Q9. What are the key limitations of the one-shot and iterated versions of the Prisoner's Dilemma?

Both versions carry important limitations:

Backward induction collapse: in a finite IPD of known length, rational players defect on the final round, then the second-to-last, and so on — cooperation unravels entirely. Yet laboratory experiments show humans cooperate in finite games, contradicting the theory.
Noise fragility: TFT breaks down under errors — a single accidental defection triggers endless retaliation spirals. Generous TFT and Pavlov are more robust.
Two-player assumption: real social dilemmas involve many players. N-player versions (public goods games) produce qualitatively different dynamics.
Human irrationality: people consistently cooperate more than game theory predicts — emotions, fairness norms, and reputation override pure payoff maximisation.

Q10. What is the single deepest insight the Prisoner's Dilemma gives us about rational behaviour and society?

The Prisoner's Dilemma reveals the most uncomfortable truth in social science: rational individuals, each acting in perfect self-interest, can produce outcomes that are bad for everyone. This is the logic behind arms races, overfishing, pollution, and price wars — every player defects, every player suffers. The Iterated version then delivers the hopeful counterpoint: when people expect to meet again, cooperation emerges naturally — not from altruism or law, but from mathematics. As Axelrod put it, the shadow of the future is what holds society together. Every institution, contract, and norm humans have ever built is ultimately an attempt to make interactions feel iterated.