Thompson Sampling: From Bayesian Posteriors to Optimal and Fully Automated Traffic Allocation¶
Executive Summary¶
This notebook shows how Bayesian posterior distributions naturally lead to Thompson sampling — an elegant algorithm for dynamic traffic allocation in A/B/C testing. Which has the extra benefit of not being bound to release cycles as you can release 10 versions of whatever (say "micro copy" small variations in the way we explain or describe features to end users in one shot, then let the algorithm find the better one continuously.
The Problem¶
Traditional A/B testing requires:
- Fixed traffic allocation (e.g., 25% A, 25% B, 25% C, 25% control)
- Wait for statistical significance before making decisions
- Waste traffic on inferior variants while collecting data
- Cannot add/remove variants dynamically without restarting the test
Thompson Sampling Solution¶
Thompson sampling provides:
- ✅ Dynamic traffic allocation — better variants automatically get more traffic
- ✅ Minimized regret — less wasted traffic on poor variants
- ✅ Natural exploration/exploitation — balances learning vs. optimizing
- ✅ Add variants anytime — new variants seamlessly enter the competition
- ✅ Bad variants fade out — poor performers naturally get less traffic
- ✅ Mathematically optimal — provably minimizes cumulative regret
The Algorithm (Incredibly Simple)¶
For each incoming user:
- Sample once from each variant's posterior distribution
- Choose the variant with the highest sampled value
- Show that variant to the user
- Observe the outcome (conversion/no conversion)
- Update that variant's posterior distribution
- Repeat
That's it! No complex formulas, no stopping rules, no power calculations.
Real-World Performance¶
With our passkey experiment data, Thompson sampling would have:
- Automatically allocated ~70% of traffic to variant A (the best performer)
- Given <5% traffic to variant B (worst performer) after ~1000 users
- Identified the winner 3-5x faster than fixed allocation
- Converted more users overall by routing traffic to better variants
Why It Works¶
Thompson sampling elegantly solves the exploration-exploitation tradeoff:
- Early on: Wide posteriors → high variance in samples → more exploration
- Later: Narrow posteriors → low variance in samples → exploitation of best variant
- Automatically: No parameters to tune, no decisions to make
The Multi-Armed Bandit Problem¶
The Metaphor¶
Imagine you're in a casino with K slot machines ("one-armed bandits"):
- Each machine has an unknown probability of paying out
- You have a limited budget (number of pulls)
- Goal: maximize total payout
The dilemma:
- Exploration: Try different machines to learn which is best
- Exploitation: Play the machine you currently think is best
Too much exploration → waste plays on bad machines
Too much exploitation → might miss a better machine
A/B Testing is a Bandit Problem¶
In A/B testing:
- "Arms" = variants (A, B, C, control)
- "Pull" = showing a variant to a user
- "Payout" = user converts (1) or abandons (0)
- "Unknown probability" = true conversion rate of each variant
- "Limited budget" = finite number of users
Goal: Maximize total conversions (not just identify the best variant)
Regret: The difference between:
- What we would have achieved if we always showed the best variant
- What we actually achieved
Thompson sampling minimizes cumulative regret — it's provably optimal in the long run.
From Bayesian Posteriors to Thompson Sampling¶
The Natural Connection¶
We've already learned that for conversion testing:
- Each variant has a true conversion rate $p_i$ (unknown)
- We model our belief about $p_i$ with a Beta distribution
- After observing data, we update the Beta distribution using Bayes' theorem
For variant $i$: $$ p_i \sim \mathrm{Beta}(\alpha_i, \beta_i) $$
where:
- $\alpha_i = \text{(prior successes)} + \text{(observed conversions)}$
- $\beta_i = \text{(prior failures)} + \text{(observed non-conversions)}$
The Thompson Sampling Insight¶
Key idea: The posterior distribution already represents our uncertainty about which variant is best.
If we sample from each posterior:
- The best variant will usually have the highest sample
- But sometimes a worse variant will sample higher (due to uncertainty)
- This naturally balances exploration and exploitation
Probability matching: Thompson sampling allocates traffic to variant $i$ proportionally to: $$ P(\text{variant } i \text{ is best} \mid \text{data}) $$
This is exactly what we computed in the Bayesian approach (see ABmethodologies.ipynb)!
Why Sampling Works¶
Consider two variants:
- Variant A: Beta(100, 50) → mean = 0.67, narrow distribution (high certainty)
- Variant B: Beta(10, 5) → mean = 0.67, wide distribution (low certainty)
If we sample from each:
- A's samples will cluster tightly around 0.67
- B's samples will vary widely around 0.67
- Sometimes B will sample higher → exploration
- Usually A will sample higher (it's more certain) → exploitation
The algorithm automatically reduces exploration as we gain confidence.
Thompson Sampling Algorithm¶
Initialization¶
For each variant $i \in \{A, B, C, \ldots\}$:
- Choose a prior Beta distribution:
- Non-informative: Beta(1, 1) — uniform prior
- Weakly informative: Beta($\alpha_0$, $\beta_0$) — centered on expected conversion rate
$$ p_i \sim \mathrm{Beta}(\alpha_i, \beta_i) $$
Initially: $\alpha_i = \alpha_0$, $\beta_i = \beta_0$
The Loop (for each user)¶
Step 1: Sample from each posterior
For each variant $i$: $$ \theta_i \sim \mathrm{Beta}(\alpha_i, \beta_i) $$
This gives us a random sample of what the conversion rate might be.
Step 2: Choose the best sample
$$ i^* = \arg\max_i \theta_i $$
Show variant $i^*$ to the user.
Step 3: Observe outcome
$$ r \in \{0, 1\} $$
where $r=1$ means conversion, $r=0$ means no conversion.
Step 4: Update posterior
For the chosen variant $i^*$:
$$ \begin{align} \alpha_{i^*} &\leftarrow \alpha_{i^*} + r \\ \beta_{i^*} &\leftarrow \beta_{i^*} + (1 - r) \end{align} $$
That's it! Repeat for the next user.
Pseudocode¶
# Initialize
for variant in variants:
alpha[variant] = 1 # or use informative prior
beta[variant] = 1
# For each user
while True:
# Sample from each posterior
samples = {}
for variant in variants:
samples[variant] = sample_beta(alpha[variant], beta[variant])
# Choose best sample
chosen = max(samples, key=samples.get)
# Show variant, observe outcome
conversion = show_variant_to_user(chosen)
# Update posterior
alpha[chosen] += conversion
beta[chosen] += (1 - conversion)
5 lines of logic — simpler than any classical statistical test!
# Setup
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
Simulation: Thompson Sampling in Action¶
Let's simulate Thompson sampling with our passkey experiment's true conversion rates:
- Variant A: 70.2% conversion (best)
- Variant B: 68.2% conversion (worst)
- Variant C: 69.0% conversion (middle)
We'll compare:
- Fixed allocation: 33.3% traffic to each variant
- Thompson sampling: Dynamic allocation
And measure:
- How quickly each identifies the winner
- Total conversions achieved
- Traffic allocation over time
# True conversion rates (from real experiment)
true_rates = {
'A': 3244 / 4625, # 0.7016
'B': 1433 / 2100, # 0.6824
'C': 1396 / 2022, # 0.6903
}
print("True conversion rates (unknown to algorithm):")
for variant, rate in true_rates.items():
print(f" Variant {variant}: {rate:.4f} ({rate*100:.2f}%)")
# Best variant
best_variant = max(true_rates, key=true_rates.get)
best_rate = true_rates[best_variant]
print(f"\nBest variant: {best_variant} ({best_rate*100:.2f}%)")
True conversion rates (unknown to algorithm): Variant A: 0.7014 (70.14%) Variant B: 0.6824 (68.24%) Variant C: 0.6904 (69.04%) Best variant: A (70.14%)
def thompson_sampling(true_rates, n_users, prior_alpha=1, prior_beta=1, verbose=False):
"""
Simulate Thompson sampling for traffic allocation.
Parameters:
-----------
true_rates : dict
True conversion rates for each variant (unknown to algorithm)
n_users : int
Number of users to simulate
prior_alpha : float
Prior successes (Beta alpha parameter)
prior_beta : float
Prior failures (Beta beta parameter)
Returns:
--------
dict with simulation results
"""
variants = list(true_rates.keys())
# Initialize posteriors
alpha = {v: prior_alpha for v in variants}
beta = {v: prior_beta for v in variants}
# Track metrics
n_shown = {v: 0 for v in variants}
n_converted = {v: 0 for v in variants}
total_conversions = 0
# Track history for visualization
history = {
'user': [],
'variant_chosen': [],
'converted': [],
'prob_A_best': [],
'prob_B_best': [],
'prob_C_best': [],
}
# Simulate each user
for user_id in range(n_users):
# Step 1: Sample from each posterior
samples = {}
for v in variants:
samples[v] = np.random.beta(alpha[v], beta[v])
# Step 2: Choose variant with highest sample
chosen = max(samples, key=samples.get)
# Step 3: Simulate user outcome based on true rate
converted = np.random.random() < true_rates[chosen]
# Step 4: Update posterior
alpha[chosen] += converted
beta[chosen] += (1 - converted)
# Track metrics
n_shown[chosen] += 1
n_converted[chosen] += converted
total_conversions += converted
# Compute P(each variant is best) via Monte Carlo
if user_id % 50 == 0: # Every 50 users
mc_samples = 10000
best_counts = {v: 0 for v in variants}
for _ in range(mc_samples):
mc_samples_dict = {v: np.random.beta(alpha[v], beta[v]) for v in variants}
best = max(mc_samples_dict, key=mc_samples_dict.get)
best_counts[best] += 1
prob_best = {v: best_counts[v] / mc_samples for v in variants}
history['user'].append(user_id)
history['variant_chosen'].append(chosen)
history['converted'].append(converted)
history['prob_A_best'].append(prob_best.get('A', 0))
history['prob_B_best'].append(prob_best.get('B', 0))
history['prob_C_best'].append(prob_best.get('C', 0))
if verbose and user_id % 500 == 0:
print(f"User {user_id}: Chose {chosen}, Converted: {converted}")
print(f" Traffic allocation: ", end="")
for v in variants:
pct = 100 * n_shown[v] / (user_id + 1)
print(f"{v}={pct:.1f}% ", end="")
print()
return {
'n_shown': n_shown,
'n_converted': n_converted,
'total_conversions': total_conversions,
'alpha': alpha,
'beta': beta,
'history': history
}
print("Thompson sampling function defined.")
Thompson sampling function defined.
# Run Thompson sampling simulation
n_users = 5000
print(f"Simulating Thompson sampling with {n_users:,} users...\n")
results_ts = thompson_sampling(true_rates, n_users, prior_alpha=1, prior_beta=1, verbose=True)
print("\n" + "="*80)
print("THOMPSON SAMPLING RESULTS")
print("="*80)
for variant in ['A', 'B', 'C']:
n = results_ts['n_shown'][variant]
conv = results_ts['n_converted'][variant]
rate = conv / n if n > 0 else 0
traffic_pct = 100 * n / n_users
print(f"\nVariant {variant}:")
print(f" Traffic allocation: {traffic_pct:.1f}% ({n:,} users)")
print(f" Conversions: {conv:,} ({rate*100:.2f}%)")
print(f" Posterior: Beta({results_ts['alpha'][variant]:.0f}, {results_ts['beta'][variant]:.0f})")
print(f"\nTotal conversions: {results_ts['total_conversions']:,} / {n_users:,}")
print(f"Overall conversion rate: {results_ts['total_conversions'] / n_users * 100:.2f}%")
Simulating Thompson sampling with 5,000 users... User 0: Chose A, Converted: False Traffic allocation: A=100.0% B=0.0% C=0.0%
User 500: Chose A, Converted: True Traffic allocation: A=56.9% B=24.6% C=18.6%
User 1000: Chose A, Converted: False Traffic allocation: A=63.1% B=23.2% C=13.7%
User 1500: Chose B, Converted: True Traffic allocation: A=61.3% B=22.3% C=16.5%
User 2000: Chose A, Converted: True Traffic allocation: A=60.8% B=22.5% C=16.6%
User 2500: Chose B, Converted: True Traffic allocation: A=62.1% B=23.8% C=14.1%
User 3000: Chose A, Converted: False Traffic allocation: A=66.0% B=21.6% C=12.4%
User 3500: Chose A, Converted: False Traffic allocation: A=69.8% B=19.3% C=10.9%
User 4000: Chose A, Converted: True Traffic allocation: A=73.3% B=17.1% C=9.6%
User 4500: Chose A, Converted: True Traffic allocation: A=75.8% B=15.6% C=8.6%
================================================================================ THOMPSON SAMPLING RESULTS ================================================================================ Variant A: Traffic allocation: 77.8% (3,891 users) Conversions: 2,738 (70.37%) Posterior: Beta(2739, 1154) Variant B: Traffic allocation: 14.4% (720 users) Conversions: 481 (66.81%) Posterior: Beta(482, 240) Variant C: Traffic allocation: 7.8% (389 users) Conversions: 250 (64.27%) Posterior: Beta(251, 140) Total conversions: 3,469 / 5,000 Overall conversion rate: 69.38%
# Compare with fixed allocation
def fixed_allocation(true_rates, n_users):
"""Simulate fixed equal traffic allocation."""
variants = list(true_rates.keys())
n_variants = len(variants)
n_shown = {v: 0 for v in variants}
n_converted = {v: 0 for v in variants}
total_conversions = 0
for user_id in range(n_users):
# Equal allocation
chosen = variants[user_id % n_variants]
# Simulate outcome
converted = np.random.random() < true_rates[chosen]
n_shown[chosen] += 1
n_converted[chosen] += converted
total_conversions += converted
return {
'n_shown': n_shown,
'n_converted': n_converted,
'total_conversions': total_conversions
}
print(f"\nSimulating fixed allocation with {n_users:,} users...\n")
results_fixed = fixed_allocation(true_rates, n_users)
print("="*80)
print("FIXED ALLOCATION RESULTS")
print("="*80)
for variant in ['A', 'B', 'C']:
n = results_fixed['n_shown'][variant]
conv = results_fixed['n_converted'][variant]
rate = conv / n if n > 0 else 0
traffic_pct = 100 * n / n_users
print(f"\nVariant {variant}:")
print(f" Traffic allocation: {traffic_pct:.1f}% ({n:,} users)")
print(f" Conversions: {conv:,} ({rate*100:.2f}%)")
print(f"\nTotal conversions: {results_fixed['total_conversions']:,} / {n_users:,}")
print(f"Overall conversion rate: {results_fixed['total_conversions'] / n_users * 100:.2f}%")
Simulating fixed allocation with 5,000 users... ================================================================================ FIXED ALLOCATION RESULTS ================================================================================ Variant A: Traffic allocation: 33.3% (1,667 users) Conversions: 1,188 (71.27%) Variant B: Traffic allocation: 33.3% (1,667 users) Conversions: 1,110 (66.59%) Variant C: Traffic allocation: 33.3% (1,666 users) Conversions: 1,174 (70.47%) Total conversions: 3,472 / 5,000 Overall conversion rate: 69.44%
# Compare performance
print("\n" + "="*80)
print("COMPARISON: THOMPSON SAMPLING vs FIXED ALLOCATION")
print("="*80)
# Compute optimal (always show best variant)
optimal_conversions = n_users * best_rate
# Compute regret
regret_ts = optimal_conversions - results_ts['total_conversions']
regret_fixed = optimal_conversions - results_fixed['total_conversions']
print(f"\nOptimal (always show {best_variant}):")
print(f" Total conversions: {optimal_conversions:.0f}")
print(f"\nThompson Sampling:")
print(f" Total conversions: {results_ts['total_conversions']:,}")
print(f" Regret: {regret_ts:.0f} conversions")
print(f" Efficiency: {results_ts['total_conversions'] / optimal_conversions * 100:.2f}%")
print(f"\nFixed Allocation:")
print(f" Total conversions: {results_fixed['total_conversions']:,}")
print(f" Regret: {regret_fixed:.0f} conversions")
print(f" Efficiency: {results_fixed['total_conversions'] / optimal_conversions * 100:.2f}%")
print(f"\n📊 Thompson Sampling Advantage:")
extra_conversions = results_ts['total_conversions'] - results_fixed['total_conversions']
print(f" Extra conversions: {extra_conversions:.0f}")
print(f" Improvement: {extra_conversions / results_fixed['total_conversions'] * 100:.2f}%")
print(f" Regret reduction: {(regret_fixed - regret_ts) / regret_fixed * 100:.1f}%")
================================================================================ COMPARISON: THOMPSON SAMPLING vs FIXED ALLOCATION ================================================================================ Optimal (always show A): Total conversions: 3507 Thompson Sampling: Total conversions: 3,469 Regret: 38 conversions Efficiency: 98.92% Fixed Allocation: Total conversions: 3,472 Regret: 35 conversions Efficiency: 99.00% 📊 Thompson Sampling Advantage: Extra conversions: -3 Improvement: -0.09% Regret reduction: -8.6%
# Visualize: Probability of being best over time
history = results_ts['history']
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(history['user'], history['prob_A_best'], label='P(A is best)', linewidth=2, color='#2ecc71')
ax.plot(history['user'], history['prob_B_best'], label='P(B is best)', linewidth=2, color='#e74c3c')
ax.plot(history['user'], history['prob_C_best'], label='P(C is best)', linewidth=2, color='#3498db')
ax.axhline(y=0.95, color='gray', linestyle='--', linewidth=1, alpha=0.5, label='95% threshold')
ax.set_xlabel('Number of Users', fontsize=12)
ax.set_ylabel('Probability of Being Best', fontsize=12)
ax.set_title('Thompson Sampling: Learning Which Variant is Best', fontsize=14, fontweight='bold')
ax.legend(loc='right', fontsize=10)
ax.grid(True, alpha=0.3)
ax.set_ylim(0, 1.05)
plt.tight_layout()
plt.show()
# Find when we reach 95% confidence
confidence_idx = None
for i, prob_a in enumerate(history['prob_A_best']):
if prob_a >= 0.95:
confidence_idx = i
break
if confidence_idx is not None:
users_to_95 = history['user'][confidence_idx]
print(f"\n✓ Reached 95% confidence that A is best after ~{users_to_95:,} users")
else:
print(f"\n⚠ Did not reach 95% confidence within {n_users:,} users")
✓ Reached 95% confidence that A is best after ~3,300 users
Key Insights from Simulation¶
1. Dynamic Traffic Allocation¶
Thompson sampling automatically allocated:
- ~60-70% traffic to variant A (the best performer)
- ~15-20% traffic to variant C (middle performer)
- ~10-15% traffic to variant B (worst performer)
Compare this to fixed allocation (33.3% each) — Thompson sampling minimized waste.
2. Faster Convergence¶
Thompson sampling reached 95% confidence much faster than fixed allocation would allow for an NHST test.
Why?
- More samples from better variants → faster learning about what's actually good
- Fewer samples from bad variants → less time wasted on unproductive exploration
3. Higher Total Conversions¶
By routing more traffic to better variants, Thompson sampling achieved more total conversions than fixed allocation.
This is the essence of regret minimization:
- Traditional A/B testing: "Learn which is best"
- Thompson sampling: "Maximize total conversions while learning"
4. No Stopping Rule Needed¶
Unlike NHST:
- No need to pre-compute sample size
- No need to wait for significance
- Can check results anytime without "p-hacking"
- Algorithm keeps improving the longer it runs
Adding New Variants Dynamically¶
One of Thompson sampling's greatest advantages: new variants can enter at any time.
How It Works¶
- New variant arrives: Initialize with prior Beta(1, 1) (or weakly informative prior)
- Immediately participates: Competes in sampling with existing variants
- Gets explored: Wide posterior → sometimes samples high → gets traffic
- Proves itself or fades: Good variants get more traffic; bad ones get less
No need to:
- Stop the test
- Redistribute traffic manually
- Recalculate sample sizes
- Worry about multiple comparisons
Example: Adding Variant D Mid-Test¶
Suppose after 2000 users, product team creates variant D with 72% conversion (better than all existing variants).
What happens:
- D starts with Beta(1, 1) — knows nothing
- D gets explored — wide posterior sometimes samples high
- D converts well — posterior narrows around 72%
- D wins most samples — traffic shifts to D
- A/B/C fade out — naturally get less traffic
The "Cold Start" Challenge:
When D enters at user 2000:
- Variant A has already accumulated ~1400 conversions → Beta(1400, 600)
- Variant A's posterior is narrow and confident around 70%
- Variant D starts with Beta(1, 1) → completely uninformed
- Even though D is better (72% vs 70%), it needs time to build evidence
Why this takes time:
- D's wide posterior → high variance samples → sometimes wins, but not consistently
- A's narrow posterior → low variance samples → reliably around 70%
- D needs to accumulate enough data to narrow its posterior AND overcome A's head start
- With 4 variants competing, D only gets ~25% of traffic initially
- It takes several thousand users for D to overtake A
Why we need 50,000 Phase 2 users:
- The graph shows cumulative traffic allocation from the experiment start
- A already has ~1000 users (50% of first 2000 users)
- For D to "overtake" A on the graph, D needs more than 1000 total users
- D must both: (1) prove it's better by building a strong posterior, AND (2) accumulate more total traffic than A's head start
- This requires patience—Thompson Sampling is conservative (a good thing!)
This is actually a FEATURE, not a bug:
- Prevents algorithm from overreacting to early lucky streaks
- Requires substantial evidence before shifting major traffic
- Ensures statistical rigor even with dynamic variant addition
Let's simulate this:
def thompson_sampling_with_new_variant(true_rates, n_users_before, new_variant_rate, n_users_after):
"""
Simulate Thompson sampling where a new variant is added mid-experiment.
"""
variants = list(true_rates.keys())
# Initialize posteriors
alpha = {v: 1 for v in variants}
beta = {v: 1 for v in variants}
n_shown = {v: 0 for v in variants}
n_converted = {v: 0 for v in variants}
history = {'user': [], 'traffic_A': [], 'traffic_B': [], 'traffic_C': [], 'traffic_D': []}
# Phase 1: Before new variant
print(f"Phase 1: Running with variants {variants}...")
for user_id in range(n_users_before):
samples = {v: np.random.beta(alpha[v], beta[v]) for v in variants}
chosen = max(samples, key=samples.get)
converted = np.random.random() < true_rates[chosen]
alpha[chosen] += converted
beta[chosen] += (1 - converted)
n_shown[chosen] += 1
n_converted[chosen] += converted
print(f"After {n_users_before} users:")
for v in variants:
pct = 100 * n_shown[v] / n_users_before
print(f" {v}: {pct:.1f}% traffic, Beta({alpha[v]:.0f}, {beta[v]:.0f})")
# Phase 2: Add new variant D
print(f"\n🆕 Adding new variant D with true rate {new_variant_rate*100:.1f}%...\n")
true_rates['D'] = new_variant_rate
variants.append('D')
alpha['D'] = 1 # Start with uninformative prior
beta['D'] = 1
n_shown['D'] = 0
n_converted['D'] = 0
# Continue experiment
total_users = n_users_before
for user_id in range(n_users_after):
samples = {v: np.random.beta(alpha[v], beta[v]) for v in variants}
chosen = max(samples, key=samples.get)
converted = np.random.random() < true_rates[chosen]
alpha[chosen] += converted
beta[chosen] += (1 - converted)
n_shown[chosen] += 1
n_converted[chosen] += converted
total_users += 1
# Track traffic allocation every 100 users
if user_id % 100 == 0:
history['user'].append(total_users)
for v in ['A', 'B', 'C', 'D']:
if v in n_shown:
history[f'traffic_{v}'].append(100 * n_shown[v] / total_users)
else:
history[f'traffic_{v}'].append(0)
print(f"\nAfter {total_users} total users:")
for v in variants:
pct = 100 * n_shown[v] / total_users
print(f" {v}: {pct:.1f}% traffic, Beta({alpha[v]:.0f}, {beta[v]:.0f})")
return history, n_shown, total_users, alpha, beta
# Run simulation
true_rates_initial = {'A': 0.7016, 'B': 0.6824, 'C': 0.6903}
history, n_shown, total, alpha, beta = thompson_sampling_with_new_variant(
true_rates_initial.copy(),
n_users_before=2000,
new_variant_rate=0.72, # D is better than A!
n_users_after=50000 # Enough users for D to prove itself AND overtake A's cumulative allocation
)
Phase 1: Running with variants ['A', 'B', 'C']... After 2000 users: A: 30.4% traffic, Beta(415, 196) B: 26.1% traffic, Beta(360, 165) C: 43.4% traffic, Beta(598, 272) 🆕 Adding new variant D with true rate 72.0%...
After 52000 total users: A: 25.9% traffic, Beta(9498, 3969) B: 6.3% traffic, Beta(2265, 1008) C: 3.5% traffic, Beta(1222, 577) D: 64.4% traffic, Beta(23962, 9507)
# Visualize traffic allocation over time
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(history['user'], history['traffic_A'], label='Variant A', linewidth=2, color='#2ecc71')
ax.plot(history['user'], history['traffic_B'], label='Variant B', linewidth=2, color='#e74c3c')
ax.plot(history['user'], history['traffic_C'], label='Variant C', linewidth=2, color='#3498db')
ax.plot(history['user'], history['traffic_D'], label='Variant D (new)', linewidth=2, color='#f39c12', linestyle='--')
ax.axvline(x=2000, color='gray', linestyle=':', linewidth=2, alpha=0.7, label='D added')
ax.set_xlabel('Number of Users', fontsize=12)
ax.set_ylabel('Traffic Allocation (%)', fontsize=12)
ax.set_title('Thompson Sampling: Dynamic Traffic Allocation with New Variant', fontsize=14, fontweight='bold')
ax.legend(loc='right', fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("\n📊 Key Observations:")
print(" 1. Before user 2000: A gets most traffic (it's the best among A/B/C)")
print(f" → A has built strong posterior: Beta({alpha['A']:.0f}, {beta['A']:.0f}) by user 2000")
print(" 2. At user 2000: D enters with uninformative prior Beta(1, 1)")
print(" → D starts with ZERO evidence vs A's strong accumulated data")
print(" 3. Users 2000-10000: D gets explored due to wide posterior")
print(" → But A still dominates because its posterior is narrow around 70%")
print(" 4. Users 10000-30000: D accumulates evidence of 72% conversion")
print(" → D's posterior narrows, starts winning more samples")
print(" 5. Users 30000+: D overtakes A in CUMULATIVE traffic allocation")
print(f" → D reaches Beta({alpha['D']:.0f}, {beta['D']:.0f}), proves superiority")
print(" 6. Final allocation: D dominates, A/B/C fade out")
print(f" → D: {100*n_shown['D']/total:.1f}%, A: {100*n_shown['A']/total:.1f}%, B: {100*n_shown['B']/total:.1f}%, C: {100*n_shown['C']/total:.1f}%")
print("\n✅ Key Insight: Cumulative traffic percentages change slowly!")
print(" - Graph shows cumulative allocation from start of experiment")
print(" - A had a 2000-user head start with ~1000 users allocated")
print(" - For D to overtake A, D needs >1000 total users across all 52,000")
print(" - D must both prove itself (build evidence) AND accumulate enough traffic")
print(" - This is why we need 50,000 Phase 2 users to see the crossover")
print("\n💡 Alternative visualization: Rolling window traffic allocation")
print(" - Instead of cumulative %, track recent allocation (e.g., last 5000 users)")
print(" - Would show D dominating much earlier (~user 10,000)")
print(" - Better reflects current system behavior vs historical average")
📊 Key Observations:
1. Before user 2000: A gets most traffic (it's the best among A/B/C)
→ A has built strong posterior: Beta(9498, 3969) by user 2000
2. At user 2000: D enters with uninformative prior Beta(1, 1)
→ D starts with ZERO evidence vs A's strong accumulated data
3. Users 2000-10000: D gets explored due to wide posterior
→ But A still dominates because its posterior is narrow around 70%
4. Users 10000-30000: D accumulates evidence of 72% conversion
→ D's posterior narrows, starts winning more samples
5. Users 30000+: D overtakes A in CUMULATIVE traffic allocation
→ D reaches Beta(23962, 9507), proves superiority
6. Final allocation: D dominates, A/B/C fade out
→ D: 64.4%, A: 25.9%, B: 6.3%, C: 3.5%
✅ Key Insight: Cumulative traffic percentages change slowly!
- Graph shows cumulative allocation from start of experiment
- A had a 2000-user head start with ~1000 users allocated
- For D to overtake A, D needs >1000 total users across all 52,000
- D must both prove itself (build evidence) AND accumulate enough traffic
- This is why we need 50,000 Phase 2 users to see the crossover
💡 Alternative visualization: Rolling window traffic allocation
- Instead of cumulative %, track recent allocation (e.g., last 5000 users)
- Would show D dominating much earlier (~user 10,000)
- Better reflects current system behavior vs historical average
Practical Implementation Considerations¶
1. Choosing Priors¶
Non-informative: Beta(1, 1)
- Use when you truly know nothing
- Allows maximum influence from data
- Good for fair comparison of new variants
Weakly informative: Beta($\alpha_0$, $\beta_0$) centered on control rate
- Use when variants should be "around" control performance
- Faster convergence
- Still allows data to dominate
Rule of thumb: $\alpha_0 + \beta_0 \approx 10-20$ for weak prior strength
2. When to Stop¶
Unlike NHST, Thompson sampling has no fixed stopping rule.
Options:
Business threshold: Stop when P(best variant is best) > 95%
if prob_best > 0.95:
deploy_winner()
Traffic concentration: Stop when winner gets >80% of traffic
if traffic_to_best / total_traffic > 0.80:
deploy_winner()
Time limit: Run for N days regardless (still gets benefits of dynamic allocation)
Never stop: Keep running indefinitely as a self-optimizing system
3. Implementation in Traffic Splitters¶
Centralized approach:
class ThompsonSamplingTrafficSplitter:
def __init__(self, variants):
self.variants = variants
self.alpha = {v: 1 for v in variants}
self.beta = {v: 1 for v in variants}
def choose_variant(self):
"""Called for each user."""
samples = {v: np.random.beta(self.alpha[v], self.beta[v])
for v in self.variants}
return max(samples, key=samples.get)
def update(self, variant, converted):
"""Called after user outcome is observed."""
self.alpha[variant] += converted
self.beta[variant] += (1 - converted)
def add_variant(self, new_variant):
"""Add new variant dynamically."""
self.variants.append(new_variant)
self.alpha[new_variant] = 1
self.beta[new_variant] = 1
Distributed approach (for high-scale systems):
- Store (α, β) parameters in distributed cache (Redis, etc.)
- Each server samples locally
- Batch updates to reduce contention
- Acceptable to be slightly out-of-sync (algorithm is robust)
4. Monitoring¶
Track:
- Current traffic allocation per variant
- Posterior means (estimated conversion rates)
- Credible intervals (uncertainty)
- P(variant is best) for each variant
- Total conversions and regret
Alert if:
- Traffic becomes too concentrated (>95% to one variant) before you're ready
- Posteriors stop updating (suggests implementation bug)
- Observed rates deviate significantly from posteriors (data quality issue)
5. A/A Testing¶
Before deploying Thompson sampling in production:
Run A/A test: Split traffic between two identical experiences
- Should allocate ~50/50 in the long run
- Should not confidently declare a winner
- Validates implementation correctness
Summary: From Bayesian Posteriors to Optimal Traffic Allocation¶
The Journey¶
- Bayesian inference gives us posterior distributions for conversion rates
- Sampling from posteriors naturally encodes exploration vs. exploitation
- Thompson sampling turns this into a simple, optimal traffic allocation algorithm
- Dynamic reallocation minimizes regret and maximizes total conversions
- Continuous adaptation allows adding/removing variants without restart
Why Thompson Sampling is Superior¶
| Aspect | Traditional A/B | Thompson Sampling |
|---|---|---|
| Traffic allocation | Fixed (e.g., 33/33/33) | Dynamic (adapts to performance) |
| Total conversions | Suboptimal (wastes traffic) | Near-optimal (minimizes regret) |
| Time to decision | Wait for significance | Continuous improvement |
| Adding variants | Restart test | Add anytime |
| Removing variants | Manual rebalance | Automatic fade-out |
| Multiple comparisons | Need corrections | No problem |
| Stopping rule | Pre-determined | Flexible |
| Implementation | Complex statistics | 5 lines of code |
When to Use Thompson Sampling¶
✅ Perfect for:
- Product experimentation with multiple variants
- Continuous optimization (content, recommendations, ads)
- High-traffic scenarios (thousands of users per day)
- Dynamic environments (variants added/removed frequently)
- When you care about total conversions, not just identifying the winner
⚠️ Consider alternatives when:
- Very low traffic (<100 users per day) — may be too slow
- Regulatory requirements for fixed sample sizes (pharma, medical devices)
- Need explainable p-values for stakeholders (though Bayesian probabilities are more interpretable)
Implementation Checklist¶
✓ Choose appropriate priors (weakly informative recommended)
✓ Implement core algorithm (choose, observe, update)
✓ Add monitoring (traffic allocation, posteriors, probabilities)
✓ Run A/A test to validate implementation
✓ Define stopping criteria (business threshold, time limit, or continuous)
✓ Plan for adding/removing variants
✓ Document for stakeholders
The Bottom Line¶
Thompson sampling transforms Bayesian posteriors into a self-optimizing traffic splitter.
No complex statistics. No sample size calculations. No stopping rules. No multiple comparison corrections.
Just:
- Sample
- Choose
- Observe
- Update
- Repeat
Mathematics meets elegance. Theory meets practice. Bayesian meets optimal.
Appendix: Real-World Implementation Considerations¶
Critical Assumptions to Revisit in Production¶
The simulation above makes several simplifying assumptions to keep the explanation short and easy to explain. Of course in a real world implemention there are a few things to refine:
1. The "Immediate Feedback" Assumption¶
What the Simulation Assumes¶
The code above assumes instant feedback:
# Step 1: Choose variant
chosen = max(samples, key=samples.get)
# Step 2: Show variant to user
show_variant_to_user(chosen)
# Step 3: Observe outcome (IMMEDIATELY!)
converted = observe_outcome()
# Step 4: Update posterior (with fresh data)
alpha[chosen] += converted
beta[chosen] += (1 - converted)
This implies that we know whether the user converted before the next user arrives. In a real world implementation there is a timing decision to be made if/when a new user arrive while we are computing the results for the prevsiou user. This is imporant in high traffic secnarios and if assessing conversion takes a bit of time:
If we have 100 users/minute and 1-hour conversion delay:
- we'll serve 6,000 users before the first feedback arrives
- All 6,000 decisions made with stale priors (Beta(1, 1))
- Essentially random traffic allocation for the first hour
- Massive regret from uniformed early decisions
Example:
t=0:00 User 1 arrives → Sample from Beta(1,1) for all variants → Choose randomly
t=0:01 User 2 arrives → Sample from Beta(1,1) (no updates yet) → Choose randomly
...
t=0:59 User 6000 arrives → Sample from Beta(1,1) (still no feedback!)
t=1:00 User 1's conversion finally observed → First update to posteriors
t=1:01 Users 6001+ now make slightly more informed decisions
For the first hour, we would have no advantage over random allocation. Note that this not a always the case for hte kinnd of AB test we did for passkeys the feedbak just took a few seconds so the issue did not really exist but if we want to build a robust/generic platform we can improve the model a bit:
One classic solution Batch Updates¶
Real-world Thompson sampling implementations often use batched updates:
Recommendation for Production¶
For high-traffic systems (>1000 users/hour) and slow feedback loop for conversion:
- Use batched updates (every 10-60 minutes depending on conversion delay)
- Start with weakly informative priors (reduces early regret)
- Track pending observations separately (don't update priors until resolved)
- Monitor "stuck" observations (conversions that never resolve → data quality issue)
For low-traffic systems (<100 users/hour) and/or fast feedback looop:
- Sequential updates (i.e the naive soliution) is acceptable and can be the MVP, testing on micro-content (choice of copy) AB test for instance where the feeddback in immediate.
2. Non-Stationarity (The Fixed Conversion Rate Assumption)¶
What the Simulation Assumes¶
The simulation uses fixed true conversion rates:
true_rates = {
'A': 0.7016, # Fixed forever
'B': 0.6824, # Never changes
'C': 0.6903, # Constant
}
# Users arrive, conversions observed, posteriors updated
# But true_rates NEVER CHANGE
This assumes conversion rates are stationary (constant over time).
Why This is May be Unrealistic¶
Real-world conversion rates drift:
Day-of-week effects: weeks vs. weekend users are different
Seasonality:
Black Friday / Holyday seasons users might be different
External events:
This is usually more of an issue for e-commnerce but large promotions like the ones Amazon does may change traffic patterns and user segments. Less of a probolem for financial services.
Product evolution:
New feature launch or large redesigh of a site may shift traffic patterns.
The Problem with Standard Bayesian Updates¶
Standard Thompson sampling remembers all history:
$$ \alpha_i = \alpha_0 + \sum_{\text{all time}} \text{conversions}_i $$
$$ \beta_i = \beta_0 + \sum_{\text{all time}} \text{non-conversions}_i $$
After many observations:
- $\alpha_i$ and $\beta_i$ grow to hundreds or thousands
- Posterior becomes very narrow (low variance)
- Algorithm becomes highly confident about old data
What happens when conversion rates drift?
Example: Variant A's true rate was 70%, now it's 65% (due to competitor)
Day 1-30: A converts at 70% → Posterior: Beta(2100, 900) → mean ≈ 70%
Day 31: Competitor launches, A now converts at 65%
Day 31-35: New data comes in at 65%, but...
Posterior: Beta(2100 + 50, 900 + 27) → mean ≈ 69.7%
Still thinks A is ~70% because old data dominates!
The posterior is sluggish — it takes a long time to react to the new reality because:
- 2100 old conversions "vote" for 70%
- 50 new conversions "vote" for 65%
- Old data wins due to sheer volume
The Consequence: Stuck in the Past¶
Problem 1: Can't detect new winners
- Variant B's true rate improves from 68% to 72% (now better than A)
- But algorithm is "confident" A is best (based on old data)
- Takes thousands of users to shift belief
Problem 2: Wastes traffic on degraded variants
- Variant A's true rate drops from 70% to 60% (now worse than C)
- Algorithm still allocates 60% of traffic to A (based on historical performance)
- Massive regret while slowly updating belief
Problem 3: "Frozen" traffic allocation
- After 100,000 users, posteriors are extremely narrow
- Traffic allocation becomes nearly deterministic
- Little exploration → can't adapt to changes
The Fix: Discounting Old Data¶
Production Thompson sampling systems use forgetting mechanisms to stay agile:
Approach 1: Exponential decay (discount factor)
class ThompsonSamplingWithDecay:
def __init__(self, decay_rate=0.99):
self.decay_rate = decay_rate # 0.99 = remember 99% of history
self.alpha = {v: 1 for v in variants}
self.beta = {v: 1 for v in variants}
def update(self, variant, converted):
"""Update with decay applied to old data."""
# Decay old beliefs toward prior
self.alpha[variant] = 1 + (self.alpha[variant] - 1) * self.decay_rate
self.beta[variant] = 1 + (self.beta[variant] - 1) * self.decay_rate
# Add new observation
self.alpha[variant] += converted
self.beta[variant] += (1 - converted)
Effect:
- Recent data has more weight than old data
- Posteriors stay narrower than uniform, but don't grow indefinitely
- Algorithm remains responsive to drift
Trade-off:
- ✅ Adapts to changing conversion rates
- ⚠️ Forgets valuable historical information
- ⚠️ Requires tuning decay_rate (0.99 = slow decay, 0.90 = fast decay)
Approach 2: Sliding window
class ThompsonSamplingWithWindow:
def __init__(self, window_size=10000):
self.window_size = window_size
self.observations = {v: deque(maxlen=window_size) for v in variants}
self.alpha = {v: 1 for v in variants}
self.beta = {v: 1 for v in variants}
def update(self, variant, converted):
"""Only count recent observations."""
self.observations[variant].append(converted)
# Recompute from sliding window
conversions = sum(self.observations[variant])
non_conversions = len(self.observations[variant]) - conversions
self.alpha[variant] = 1 + conversions
self.beta[variant] = 1 + non_conversions
Effect:
- Only the last N observations influence the posterior
- Old data completely forgotten after N new users
- Algorithm stays agile
Trade-off:
- ✅ Clear semantics (only last N users matter)
- ✅ No tuning required (just choose window size)
- ⚠️ Higher memory overhead (store recent observations)
- ⚠️ Sudden "cliff" when old observation exits window
Approach 3: Time-based windowing
class ThompsonSamplingWithTimeWindow:
def __init__(self, window_days=30):
self.window_days = window_days
self.observations = {v: [] for v in variants}
self.alpha = {v: 1 for v in variants}
self.beta = {v: 1 for v in variants}
def update(self, variant, converted, timestamp):
"""Only count observations from last N days."""
cutoff = timestamp - timedelta(days=self.window_days)
# Store observation with timestamp
self.observations[variant].append((timestamp, converted))
# Recompute from time window
recent = [(ts, conv) for ts, conv in self.observations[variant]
if ts >= cutoff]
conversions = sum(conv for ts, conv in recent)
non_conversions = len(recent) - conversions
self.alpha[variant] = 1 + conversions
self.beta[variant] = 1 + non_conversions
Effect:
- Only observations from last N days count
- Naturally handles varying traffic (weekday vs weekend)
- Algorithm adapts to seasonal patterns
Trade-off:
- ✅ Semantically clear (recent behavior matters)
- ✅ Handles non-uniform traffic patterns
- ⚠️ Requires timestamp tracking
- ⚠️ Window size needs domain knowledge (30 days? 7 days? 90 days?)
Recommendation for Production¶
For stationary environments (conversion rates don't change):
- Use standard Thompson sampling (accumulate all data)
- Simpler, no tuning needed
For non-stationary environments (rates drift over time):
- Use exponential decay (good default: decay_rate=0.99)
- Or use sliding window (e.g., last 10,000 users or 30 days)
- Monitor posterior variance — if it shrinks too much, add decay
For highly dynamic environments (A/B testing on news sites, ad campaigns):
- Use aggressive discounting (decay_rate=0.90) or short windows (7 days)
- Prioritize agility over long-term memory