-
Notifications
You must be signed in to change notification settings - Fork 459
Use Cases Content Delivery
ExaBGP enables intelligent traffic routing for Content Delivery Networks using anycast, geographic load balancing, and dynamic cache server advertisement.
- Overview
- Architecture Patterns
- Configuration Examples
- Health Checks
- Geographic Load Balancing
- Cache Server Advertisement
- Troubleshooting
- See Also
Content Delivery Networks face several challenges:
- Geographic distribution: Serving content from locations close to users
- Load balancing: Distributing traffic across multiple cache servers
- Failover: Handling cache server failures gracefully
- Capacity management: Adding/removing servers based on demand
- DDoS protection: Mitigating attacks while maintaining service
ExaBGP enables dynamic CDN traffic management by:
- Anycast IP advertisement: Multiple cache servers share the same IP address
- Health-based routing: Automatically withdraw routes for failed servers
- Geographic steering: Route users to nearest cache location
- Load-based balancing: Adjust route advertisement based on server load
- DDoS mitigation: Integrate with FlowSpec for attack filtering
Important: ExaBGP announces routes via BGP but does NOT manipulate the kernel routing table. Your API program must handle route installation if needed for local traffic.
Multiple cache servers advertise the same IP address from different locations:
[Cache NYC] [Cache LON] [Cache SFO] [Cache TKY]
| | | |
ExaBGP ExaBGP ExaBGP ExaBGP
| | | |
+---[BGP Network]---+--------+--------+-----+
|
[Internet]
|
[Users]
Benefits:
- Users automatically routed to nearest location
- Automatic failover if location goes down
- Simple DNS configuration (single IP address)
Use BGP communities to control which regions receive routes:
[US Caches] --community:100:1--> [US ISPs]
[EU Caches] --community:100:2--> [EU ISPs]
[APAC Caches] --community:100:3--> [APAC ISPs]
Benefits:
- Fine-grained geographic control
- Comply with data sovereignty requirements
- Optimize for regional peering
Combine edge caches with origin servers:
[Edge Caches] --Anycast 1.2.3.4--> [Users]
|
[Origin Servers] --Anycast 10.0.0.1--> [Edge Caches]
Benefits:
- Two-tier architecture for scale
- Origin servers not directly exposed
- Edge cache misses served from origin
Configuration (/etc/exabgp/cdn-anycast.conf):
process cache-monitor {
run python3 /etc/exabgp/cache-health.py;
encoder json;
}
neighbor 192.0.2.1 {
router-id 10.0.0.1;
local-address 10.0.0.1;
local-as 65001;
peer-as 65000;
family {
ipv4 unicast;
}
api {
processes [ cache-monitor ];
}
}
neighbor 192.0.2.2 {
router-id 10.0.0.1;
local-address 10.0.0.1;
local-as 65001;
peer-as 65000;
family {
ipv4 unicast;
}
api {
processes [ cache-monitor ];
}
}Configuration (/etc/exabgp/cdn-geo.conf):
process geo-controller {
run python3 /etc/exabgp/geo-steering.py;
encoder json;
}
neighbor 192.0.2.1 {
router-id 10.0.0.1;
local-address 10.0.0.1;
local-as 65001;
peer-as 65000;
family {
ipv4 unicast;
}
# Allow community attachment
capability {
route-refresh;
}
api {
processes [ geo-controller ];
}
}API Program (/etc/exabgp/cache-health.py):
#!/usr/bin/env python3
import sys
import time
import requests
# CDN anycast IP
ANYCAST_IP = "1.2.3.4/32"
NEXT_HOP = "10.0.0.1"
# Health check configuration
HEALTH_URL = "http://127.0.0.1/health"
CHECK_INTERVAL = 10
FAILURE_THRESHOLD = 3
def check_cache_health():
"""Check if cache server is healthy"""
try:
response = requests.get(HEALTH_URL, timeout=5)
return response.status_code == 200
except:
return False
def announce_route():
"""Announce anycast route"""
print(f"announce route {ANYCAST_IP} next-hop {NEXT_HOP}", flush=True)
def withdraw_route():
"""Withdraw anycast route"""
print(f"withdraw route {ANYCAST_IP} next-hop {NEXT_HOP}", flush=True)
# Main loop
announced = False
failures = 0
while True:
healthy = check_cache_health()
if healthy:
failures = 0
if not announced:
announce_route()
announced = True
else:
failures += 1
if failures >= FAILURE_THRESHOLD and announced:
withdraw_route()
announced = False
time.sleep(CHECK_INTERVAL)Withdraw route when server is overloaded:
#!/usr/bin/env python3
import sys
import time
import psutil
ANYCAST_IP = "1.2.3.4/32"
NEXT_HOP = "10.0.0.1"
# Thresholds
CPU_THRESHOLD = 80.0
MEMORY_THRESHOLD = 90.0
CONNECTION_THRESHOLD = 10000
def check_load():
"""Check server load"""
cpu = psutil.cpu_percent(interval=1)
memory = psutil.virtual_memory().percent
connections = len(psutil.net_connections())
return {
'healthy': cpu < CPU_THRESHOLD and
memory < MEMORY_THRESHOLD and
connections < CONNECTION_THRESHOLD,
'cpu': cpu,
'memory': memory,
'connections': connections
}
announced = False
while True:
status = check_load()
if status['healthy'] and not announced:
print(f"announce route {ANYCAST_IP} next-hop {NEXT_HOP}", flush=True)
announced = True
elif not status['healthy'] and announced:
print(f"withdraw route {ANYCAST_IP} next-hop {NEXT_HOP}", flush=True)
announced = False
time.sleep(30)API Program (/etc/exabgp/geo-steering.py):
#!/usr/bin/env python3
import sys
ANYCAST_IP = "1.2.3.4/32"
NEXT_HOP = "10.0.0.1"
# Location detection (simplified)
import socket
hostname = socket.gethostname()
# Map locations to BGP communities
GEO_COMMUNITIES = {
'us-east': '65000:1',
'us-west': '65000:2',
'eu-west': '65000:3',
'eu-central': '65000:4',
'apac': '65000:5'
}
# Detect location from hostname
location = None
for loc in GEO_COMMUNITIES:
if loc in hostname:
location = loc
break
if location:
community = GEO_COMMUNITIES[location]
print(f"announce route {ANYCAST_IP} next-hop {NEXT_HOP} "
f"community [{community}]", flush=True)
else:
# Default: announce without community
print(f"announce route {ANYCAST_IP} next-hop {NEXT_HOP}", flush=True)
# Keep running
while True:
line = sys.stdin.readline().strip()
if not line:
breakAnnounce different prefixes per region:
#!/usr/bin/env python3
import sys
import json
# Regional anycast IPs
REGIONS = {
'us': {
'ip': '1.2.3.4/32',
'next_hop': '10.1.0.1',
'community': '65000:1'
},
'eu': {
'ip': '1.2.3.5/32',
'next_hop': '10.2.0.1',
'community': '65000:2'
},
'apac': {
'ip': '1.2.3.6/32',
'next_hop': '10.3.0.1',
'community': '65000:3'
}
}
# Detect region (from config file, environment, etc.)
import os
region = os.environ.get('CDN_REGION', 'us')
if region in REGIONS:
config = REGIONS[region]
print(f"announce route {config['ip']} "
f"next-hop {config['next_hop']} "
f"community [{config['community']}]", flush=True)
while True:
line = sys.stdin.readline().strip()
if not line:
breakAdd/remove cache servers from pool dynamically:
#!/usr/bin/env python3
import sys
import time
import requests
ANYCAST_IP = "1.2.3.4/32"
NEXT_HOP = "10.0.0.1"
# Cache server pool
CACHE_SERVERS = [
"http://10.0.1.1",
"http://10.0.1.2",
"http://10.0.1.3"
]
def check_cache_pool():
"""Check health of all cache servers in pool"""
healthy_count = 0
for server in CACHE_SERVERS:
try:
response = requests.get(f"{server}/health", timeout=2)
if response.status_code == 200:
healthy_count += 1
except:
pass
return healthy_count
announced = False
MIN_HEALTHY = 1 # Minimum healthy servers to announce route
while True:
healthy = check_cache_pool()
if healthy >= MIN_HEALTHY and not announced:
print(f"announce route {ANYCAST_IP} next-hop {NEXT_HOP}", flush=True)
announced = True
elif healthy < MIN_HEALTHY and announced:
print(f"withdraw route {ANYCAST_IP} next-hop {NEXT_HOP}", flush=True)
announced = False
time.sleep(15)Protect origin servers with cache layer:
#!/usr/bin/env python3
import sys
import time
# Edge cache announces customer-facing IP
EDGE_IP = "1.2.3.4/32"
EDGE_NEXT_HOP = "10.0.1.1"
# Origin announces private IP to edge caches only
ORIGIN_IP = "10.100.0.1/32"
ORIGIN_NEXT_HOP = "10.0.2.1"
def announce_edge():
"""Announce edge cache to internet"""
print(f"announce route {EDGE_IP} next-hop {EDGE_NEXT_HOP}", flush=True)
def announce_origin():
"""Announce origin to edge caches (with community)"""
print(f"announce route {ORIGIN_IP} next-hop {ORIGIN_NEXT_HOP} "
f"community [65000:100]", flush=True)
# Determine server role
import os
role = os.environ.get('CDN_ROLE', 'edge')
if role == 'edge':
announce_edge()
elif role == 'origin':
announce_origin()
while True:
time.sleep(60)Problem: Return traffic not using optimal path
Solution:
# Ensure consistent next-hop across all announcements
# Use loopback interface as next-hop
NEXT_HOP = "10.0.0.1" # Loopback IP, always available
# Add to kernel routing table (outside ExaBGP)
# ip route add 1.2.3.4/32 dev loProblem: Traffic continues to failed cache server
Debugging:
# Check BGP convergence time
time exabgpcli neighbor 192.0.2.1 show routes
# Check route propagation
traceroute -n 1.2.3.4Solution:
- Reduce health check interval
- Use BFD (Bidirectional Forwarding Detection) if supported
- Tune BGP timers on routers
Problem: Users routed to wrong region
Debugging:
# Verify communities are sent
exabgpcli neighbor 192.0.2.1 advertised-routes
# Check router community configuration
# show ip bgp community 65000:1Solution:
- Verify router accepts and honors communities
- Check community format (standard vs extended)
- Ensure no-export community if needed
Monitor CDN health with Prometheus:
#!/usr/bin/env python3
from prometheus_client import start_http_server, Gauge
import time
import requests
# Metrics
route_announced = Gauge('cdn_route_announced', 'Route announcement status')
cache_health = Gauge('cdn_cache_health', 'Cache server health')
cache_load = Gauge('cdn_cache_load', 'Cache server load')
start_http_server(9100)
while True:
# Update metrics
# ... health checks ...
time.sleep(10)Log route changes for troubleshooting:
import logging
logging.basicConfig(
filename='/var/log/exabgp-cdn.log',
level=logging.INFO,
format='%(asctime)s - %(message)s'
)
def announce_route():
logging.info(f"Announcing route {ANYCAST_IP}")
print(f"announce route {ANYCAST_IP} next-hop {NEXT_HOP}", flush=True)
def withdraw_route():
logging.info(f"Withdrawing route {ANYCAST_IP}")
print(f"withdraw route {ANYCAST_IP} next-hop {NEXT_HOP}", flush=True)- Anycast Management - Anycast architecture patterns
- Service High Availability - HA strategies
- Load Balancing - BGP-based load balancing
- DDoS Mitigation - DDoS protection with FlowSpec
π» Ghost written by Claude (Anthropic AI)
π Home
π Getting Started
π§ API
π‘οΈ Use Cases
π Address Families
βοΈ Configuration
π Operations
π Reference
- Architecture
- BGP State Machine
- Communities (RFC)
- Extended Communities
- BGP Ecosystem
- Capabilities (AFI/SAFI)
- RFC Support
π Migration
π Community
π External
- GitHub Repo β
- Slack β
- Issues β
π» Ghost written by Claude (Anthropic AI)