-
Notifications
You must be signed in to change notification settings - Fork 459
Common Pitfalls
This guide covers the most common mistakes and misunderstandings when using ExaBGP, along with their solutions. Reading this can save you hours of debugging time.
- Critical Misunderstandings
- Configuration Errors
- API Programming Mistakes
- BGP Protocol Issues
- Performance Problems
- Security Mistakes
- Deployment Issues
- Version-Specific Pitfalls
- See Also
β Wrong Assumption: "I announced a route via ExaBGP, so traffic will be forwarded."
β Reality: ExaBGP is a BGP protocol implementation, NOT a router. It does NOT:
- Install routes in the kernel routing table (RIB/FIB)
- Forward IP packets
- Handle ARP/NDP
- Create VRFs, VXLAN tunnels, or MPLS labels
What ExaBGP Actually Does:
- Sends/receives BGP UPDATE messages
- Provides API for applications to control BGP announcements
- Handles BGP session management
Solution:
- ExaBGP announces route via BGP β Peer router receives it
- Peer router installs route in its RIB/FIB (if best path)
- Peer router forwards traffic based on its routing table
- Your application must handle traffic locally (configure IPs on interfaces, run services, etc.)
Example:
# This announces 100.64.1.1/32 via BGP
print("announce route 100.64.1.1/32 next-hop self")
# But you MUST also:
# 1. Configure 100.64.1.1 on a local interface (e.g., loopback)
# 2. Run the actual service on that IP
# 3. Ensure next-hop is reachable# Configure the service IP on loopback
ip addr add 100.64.1.1/32 dev lo
# Start your service
systemctl start myservice
# Then let ExaBGP announce itβ Wrong Code:
#!/usr/bin/env python3
import sys
print("announce route 100.64.1.0/24 next-hop self")
# Missing sys.stdout.flush()!
time.sleep(60)Problem: ExaBGP reads from STDIN line by line. Without flushing, commands buffer and may not be sent immediately.
β Correct Code:
#!/usr/bin/env python3
import sys
print("announce route 100.64.1.0/24 next-hop self")
sys.stdout.flush() # Always flush!
time.sleep(60)Impact: Routes announced with significant delay or not at all.
Solution: ALWAYS call sys.stdout.flush() after every print() statement.
β Wrong:
# Next-hop is not a local IP
print("announce route 100.64.1.0/24 next-hop 203.0.113.1")Problem: If 203.0.113.1 is not reachable from the peer router, the route won't be installed in the peer's FIB.
β Correct:
# Use 'self' (ExaBGP substitutes local-address)
print("announce route 100.64.1.0/24 next-hop self")Or ensure next-hop is explicitly configured:
neighbor 192.0.2.1 {
local-address 192.0.2.2; # This becomes 'next-hop self'
# ...
}Rule: Next-hop must be reachable from the receiving router via its routing table.
β Wrong:
neighbor 192.0.2.1 {
router-id 192.0.2.2;
local-address 192.0.2.2;
local-as 65001;
peer-as 65000;
# Missing family configuration!
}Problem: For non-default address families (EVPN, FlowSpec, VPNv4, etc.), you must explicitly enable them.
β Correct:
neighbor 192.0.2.1 {
router-id 192.0.2.2;
local-address 192.0.2.2;
local-as 65001;
peer-as 65000;
family {
ipv4 flow; # FlowSpec
evpn; # EVPN
ipv4 vpn; # VPNv4
}
}Note: IPv4 unicast is enabled by default; others must be explicit.
β Wrong:
neighbor 192.0.2.1 {
router-id 192.0.2.2; # No indentation!
local-address 192.0.2.2;
}Problem: ExaBGP's config parser is sensitive to indentation.
β Correct:
neighbor 192.0.2.1 {
router-id 192.0.2.2; # Consistent indentation
local-address 192.0.2.2;
}Solution: Use tabs or consistent spaces (4 spaces recommended). Don't mix.
β Wrong:
neighbor 192.0.2.1 {
local-as 65001.100; # Dot notation not supported
}β Correct:
neighbor 192.0.2.1 {
local-as 65001; # Plain integer
}For 4-byte ASNs:
local-as 4200000000; # Use integer form, not asdotβ Wrong:
while True:
time.sleep(60)
# Never checks for ExaBGP shutdownProblem: When ExaBGP terminates, your process keeps running as a zombie.
β Correct:
while True:
line = sys.stdin.readline()
if not line: # EOF - ExaBGP terminated
break
# Process messages...Or for announcement-only scripts:
import signal
import sys
def signal_handler(signum, frame):
sys.exit(0)
signal.signal(signal.SIGTERM, signal_handler)
signal.signal(signal.SIGINT, signal_handler)
while True:
time.sleep(60)β Wrong:
while True:
line = sys.stdin.readline()
msg = json.loads(line) # Will crash on invalid JSONProblem: Invalid JSON crashes your script, taking down your BGP announcements.
β Correct:
while True:
line = sys.stdin.readline()
if not line:
break
try:
msg = json.loads(line)
# Process message...
except json.JSONDecodeError as e:
print(f"JSON parse error: {e}", file=sys.stderr)
continue # Don't crash, just skip bad message
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
continueβ Wrong:
while True:
if check_health():
announce()
else:
withdraw()
time.sleep(1)Problem: Transient health check failures cause route flapping.
β Correct (with dampening):
rise_count = 0
fall_count = 0
announced = False
while True:
if check_health():
rise_count += 1
fall_count = 0
if rise_count >= 3 and not announced: # 3 consecutive passes
announce()
announced = True
rise_count = 0
else:
fall_count += 1
rise_count = 0
if fall_count >= 2 and announced: # 2 consecutive failures
withdraw()
announced = False
fall_count = 0
time.sleep(5)Why: Avoids BGP churn from momentary failures.
β Wrong:
#!/usr/bin/env python3
# Hardcoded path won't work on other systems
import sys
sys.path.append('/home/alice/myproject')β Correct:
#!/usr/bin/env python3
import sys
import os
# Use relative paths or environment variables
script_dir = os.path.dirname(os.path.realpath(__file__))
sys.path.append(os.path.join(script_dir, 'lib'))β Wrong:
# ExaBGP config
neighbor 192.0.2.1 {
local-as 65001;
peer-as 65002; # Says peer is AS 65002
}
# But peer router is actually configured as AS 65000!Problem: BGP session won't establish. Logs show OPEN message error.
β Solution: Verify peer's ASN:
# Check peer's actual ASN
show bgp summary # On routerEnsure peer-as in ExaBGP matches peer's actual local-as.
β Wrong:
neighbor 192.0.2.1 {
router-id 192.0.2.1; # Same as neighbor!
}
neighbor 192.0.2.2 {
router-id 192.0.2.1; # Same router-id for different neighbors!
}Problem: BGP router-id must be unique per ExaBGP instance, not per neighbor.
β Correct:
# Use same router-id for all neighbors (but unique per ExaBGP instance)
neighbor 192.0.2.1 {
router-id 192.0.2.100; # ExaBGP's unique ID
}
neighbor 192.0.2.2 {
router-id 192.0.2.100; # Same router-id
}Rule: One router-id per ExaBGP process, unique across your network.
β Wrong:
neighbor 192.0.2.1 {
tcp {
md5-password "secret123";
}
}
# But peer router has "secret456"Problem: TCP connection fails silently. No BGP session.
β Solution:
neighbor 192.0.2.1 {
tcp {
md5-password "secret456"; # Must match peer exactly
}
}Verification:
# On peer router
show bgp neighbors 192.0.2.2 | include password
# Check logs for TCP connection refused
exabgp -d config.iniβ Issue:
# Announced route
print("announce route 100.64.1.0/24 next-hop self")But peer router has:
# Cisco IOS-XR
router bgp 65000
neighbor 192.0.2.2
address-family ipv4 unicast
route-policy BLOCK-ALL in # Blocks everything!
Problem: Routes announced but peer rejects them via import policy.
β Solution: Verify peer's import filters:
# Check peer's import policy
show bgp neighbor 192.0.2.2 policy
# Or allow ExaBGP routes
route-policy ALLOW-EXABGP
if as-path passes-through '65001' then
pass
endif
end-policyβ Wrong:
while True:
check_health() # Every 100ms!
time.sleep(0.1)Problem: Excessive CPU usage, doesn't improve convergence (BGP propagation takes seconds anyway).
β Correct:
while True:
check_health()
time.sleep(5) # 5-10 seconds is reasonableWhy: BGP convergence typically 5-15 seconds. Checking every 100ms wastes resources.
β Wrong (Full Mesh iBGP):
100 ExaBGP instances
β
100 Γ 99 / 2 = 4,950 BGP sessions!
β Correct (Route Reflector):
100 ExaBGP instances
β
100 sessions to 2 Route Reflectors
= 100 sessions total
Solution: Use BGP Route Reflectors for large deployments (>10 speakers).
β Wrong:
# Announcing /32 for every IP in /24
for i in range(1, 255):
print(f"announce route 100.64.1.{i}/32 next-hop self")Problem: Unnecessary churn, large BGP table, slow convergence.
β Correct:
# Announce aggregate
print("announce route 100.64.1.0/24 next-hop self")Rule: Aggregate when possible. Only announce /32 for anycast or specific services.
β Wrong:
neighbor 192.0.2.1 {
# No authentication!
}Problem: Anyone who can reach your ExaBGP can inject routes.
β Correct:
neighbor 192.0.2.1 {
tcp {
md5-password "strong-random-password-here";
}
}Better (TCP-AO):
neighbor 192.0.2.1 {
tcp {
ao-keyid 1;
ao-key "hex:deadbeef...";
}
}β Wrong:
process route-injector {
run /root/inject.py; # Runs as root!
}Problem: If your script has vulnerabilities, attacker gets root access.
β Correct:
process route-injector {
run /opt/exabgp/inject.py;
user exabgp; # Run as unprivileged user
env {
exabgp.user = exabgp;
}
}# Create unprivileged user
useradd -r -s /bin/false exabgp
chown exabgp:exabgp /opt/exabgp/inject.pyβ Wrong:
# ExaBGP listening on all interfaces
exabgp --bind 0.0.0.0:179 config.iniProblem: Anyone on network can connect to BGP port.
β Correct:
# Bind to localhost or specific interface only
exabgp --bind 127.0.0.1:179 config.ini
# Or use firewall
iptables -A INPUT -p tcp --dport 179 -s 192.0.2.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 179 -j DROPβ Wrong:
#!/usr/bin/env python3
# No logging at all
if check_health():
announce()Problem: Impossible to troubleshoot when things go wrong.
β Correct:
#!/usr/bin/env python3
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s %(levelname)s: %(message)s',
filename='/var/log/exabgp-health.log'
)
if check_health():
logging.info("Health check passed, announcing route")
announce()
else:
logging.warning("Health check failed, withdrawing route")
withdraw()β Wrong: "I announced routes, they must be working."
Problem: BGP session might be down, routes not actually advertised.
β Correct:
# Parse BGP state messages from ExaBGP
def handle_state(msg):
state = msg.get('neighbor', {}).get('state')
if state == 'down':
logging.error("BGP session down!")
# Alert ops team
elif state == 'up':
logging.info("BGP session established")
# In your receiver loop
if msg.get('type') == 'state':
handle_state(msg)Better: Use external monitoring (Prometheus, Grafana) to track BGP state.
β Wrong:
# Kill ExaBGP immediately
kill -9 $(pidof exabgp)Problem: Routes withdrawn abruptly, traffic drops.
β Correct:
# Graceful shutdown
kill -TERM $(pidof exabgp)
# Or withdraw routes first
echo "withdraw route 100.64.1.0/24 next-hop self" | \
socat - UNIX-CONNECT:/run/exabgp/exabgp.sock
sleep 30 # Wait for BGP convergence
systemctl stop exabgpβ Wrong (program hangs):
import sys
# Send command
print("announce route 100.64.1.0/24 next-hop self")
sys.stdout.flush()
# Program hangs here because ACK is enabled by default!
# ExaBGP sends "done\n" but we never read it
# This causes backpressure and eventually hangsProblem: ACK is enabled by default in ExaBGP 4.x and 5.x. If you don't read responses, the pipe fills up and blocks.
β Solution 1 - Read ACK responses (recommended):
import sys
import select
import time
def wait_for_ack(expected_count=1, timeout=30):
"""
Wait for ACK responses with polling loop.
Handles both text and JSON encoder formats.
"""
import json
received = 0
start_time = time.time()
while received < expected_count:
if time.time() - start_time >= timeout:
return False
ready, _, _ = select.select([sys.stdin], [], [], 0.1)
if ready:
line = sys.stdin.readline().strip()
# Parse response (could be text or JSON)
answer = None
if line.startswith('{'):
try:
data = json.loads(line)
answer = data.get('answer')
except:
pass
else:
answer = line
if answer == "done":
received += 1
elif answer == "error":
return False
elif answer == "shutdown":
raise SystemExit(0)
else:
time.sleep(0.1)
return True
# Send command
sys.stdout.write("announce route 100.64.1.0/24 next-hop self\n")
sys.stdout.flush()
# Wait for ACK (with polling loop)
if not wait_for_ack():
sys.exit(1) # Command failedβ Solution 2 - Disable ACK (simpler but no error feedback):
# Option A: Environment variable (4.x and 5.x)
export exabgp.api.ack=false
exabgp /etc/exabgp/exabgp.conf
# Option B: Runtime command (5.x/main only)
# Send: disable-ack or silence-ackSee: ACK Feature Documentation for details.
- Debugging Guide - Troubleshooting techniques
- First BGP Session - Basic setup guide
- API Overview - API programming guide
- Production Best Practices - Production deployment
- GitHub Issues: https://github.com/Exa-Networks/exabgp/issues
- Slack: https://exabgp.slack.com/
- Mailing List: Archive at Google Groups
Session won't establish?
- Check ASNs match (
local-as,peer-as) - Check router-id is unique
- Check TCP MD5 password matches
- Verify network connectivity (
ping,tcpdump)
Routes announced but not working?
- Verify peer accepts routes (
show bgp neighbor received-routes) - Check next-hop is reachable from peer
- Verify service IP is configured locally (
ip addr show) - Check peer's import filters
Health checks flapping?
- Add dampening (rise/fall counters)
- Increase health check interval
- Check health check logic (timeouts, retries)
π» Ghost written by Claude (Anthropic AI)
π Home
π Getting Started
π§ API
π‘οΈ Use Cases
π Address Families
βοΈ Configuration
π Operations
π Reference
- Architecture
- BGP State Machine
- Communities (RFC)
- Extended Communities
- BGP Ecosystem
- Capabilities (AFI/SAFI)
- RFC Support
π Migration
π Community
π External
- GitHub Repo β
- Slack β
- Issues β
π» Ghost written by Claude (Anthropic AI)