Skip to content

Common Pitfalls

Thomas Mangin edited this page Nov 15, 2025 · 6 revisions

Common Pitfalls

This guide covers the most common mistakes and misunderstandings when using ExaBGP, along with their solutions. Reading this can save you hours of debugging time.

Table of Contents


Critical Misunderstandings

Pitfall #1: Thinking ExaBGP is a Router

❌ Wrong Assumption: "I announced a route via ExaBGP, so traffic will be forwarded."

βœ… Reality: ExaBGP is a BGP protocol implementation, NOT a router. It does NOT:

  • Install routes in the kernel routing table (RIB/FIB)
  • Forward IP packets
  • Handle ARP/NDP
  • Create VRFs, VXLAN tunnels, or MPLS labels

What ExaBGP Actually Does:

  • Sends/receives BGP UPDATE messages
  • Provides API for applications to control BGP announcements
  • Handles BGP session management

Solution:

  1. ExaBGP announces route via BGP β†’ Peer router receives it
  2. Peer router installs route in its RIB/FIB (if best path)
  3. Peer router forwards traffic based on its routing table
  4. Your application must handle traffic locally (configure IPs on interfaces, run services, etc.)

Example:

# This announces 100.64.1.1/32 via BGP
print("announce route 100.64.1.1/32 next-hop self")

# But you MUST also:
# 1. Configure 100.64.1.1 on a local interface (e.g., loopback)
# 2. Run the actual service on that IP
# 3. Ensure next-hop is reachable
# Configure the service IP on loopback
ip addr add 100.64.1.1/32 dev lo

# Start your service
systemctl start myservice

# Then let ExaBGP announce it

Pitfall #2: Forgetting to Flush stdout

❌ Wrong Code:

#!/usr/bin/env python3
import sys

print("announce route 100.64.1.0/24 next-hop self")
# Missing sys.stdout.flush()!

time.sleep(60)

Problem: ExaBGP reads from STDIN line by line. Without flushing, commands buffer and may not be sent immediately.

βœ… Correct Code:

#!/usr/bin/env python3
import sys

print("announce route 100.64.1.0/24 next-hop self")
sys.stdout.flush()  # Always flush!

time.sleep(60)

Impact: Routes announced with significant delay or not at all.

Solution: ALWAYS call sys.stdout.flush() after every print() statement.


Pitfall #3: Incorrect Next-Hop

❌ Wrong:

# Next-hop is not a local IP
print("announce route 100.64.1.0/24 next-hop 203.0.113.1")

Problem: If 203.0.113.1 is not reachable from the peer router, the route won't be installed in the peer's FIB.

βœ… Correct:

# Use 'self' (ExaBGP substitutes local-address)
print("announce route 100.64.1.0/24 next-hop self")

Or ensure next-hop is explicitly configured:

neighbor 192.0.2.1 {
    local-address 192.0.2.2;  # This becomes 'next-hop self'
    # ...
}

Rule: Next-hop must be reachable from the receiving router via its routing table.


Configuration Errors

Pitfall #4: Missing Family Declaration

❌ Wrong:

neighbor 192.0.2.1 {
    router-id 192.0.2.2;
    local-address 192.0.2.2;
    local-as 65001;
    peer-as 65000;
    # Missing family configuration!
}

Problem: For non-default address families (EVPN, FlowSpec, VPNv4, etc.), you must explicitly enable them.

βœ… Correct:

neighbor 192.0.2.1 {
    router-id 192.0.2.2;
    local-address 192.0.2.2;
    local-as 65001;
    peer-as 65000;

    family {
        ipv4 flow;      # FlowSpec
        evpn;           # EVPN
        ipv4 vpn;       # VPNv4
    }
}

Note: IPv4 unicast is enabled by default; others must be explicit.


Pitfall #5: Incorrect Indentation

❌ Wrong:

neighbor 192.0.2.1 {
router-id 192.0.2.2;  # No indentation!
    local-address 192.0.2.2;
}

Problem: ExaBGP's config parser is sensitive to indentation.

βœ… Correct:

neighbor 192.0.2.1 {
    router-id 192.0.2.2;      # Consistent indentation
    local-address 192.0.2.2;
}

Solution: Use tabs or consistent spaces (4 spaces recommended). Don't mix.


Pitfall #6: Wrong ASN Format

❌ Wrong:

neighbor 192.0.2.1 {
    local-as 65001.100;  # Dot notation not supported
}

βœ… Correct:

neighbor 192.0.2.1 {
    local-as 65001;      # Plain integer
}

For 4-byte ASNs:

local-as 4200000000;  # Use integer form, not asdot

API Programming Mistakes

Pitfall #7: Not Handling stdin EOF

❌ Wrong:

while True:
    time.sleep(60)
    # Never checks for ExaBGP shutdown

Problem: When ExaBGP terminates, your process keeps running as a zombie.

βœ… Correct:

while True:
    line = sys.stdin.readline()
    if not line:  # EOF - ExaBGP terminated
        break

    # Process messages...

Or for announcement-only scripts:

import signal
import sys

def signal_handler(signum, frame):
    sys.exit(0)

signal.signal(signal.SIGTERM, signal_handler)
signal.signal(signal.SIGINT, signal_handler)

while True:
    time.sleep(60)

Pitfall #8: Ignoring JSON Parse Errors

❌ Wrong:

while True:
    line = sys.stdin.readline()
    msg = json.loads(line)  # Will crash on invalid JSON

Problem: Invalid JSON crashes your script, taking down your BGP announcements.

βœ… Correct:

while True:
    line = sys.stdin.readline()
    if not line:
        break

    try:
        msg = json.loads(line)
        # Process message...
    except json.JSONDecodeError as e:
        print(f"JSON parse error: {e}", file=sys.stderr)
        continue  # Don't crash, just skip bad message
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
        continue

Pitfall #9: No Health Check Dampening

❌ Wrong:

while True:
    if check_health():
        announce()
    else:
        withdraw()
    time.sleep(1)

Problem: Transient health check failures cause route flapping.

βœ… Correct (with dampening):

rise_count = 0
fall_count = 0
announced = False

while True:
    if check_health():
        rise_count += 1
        fall_count = 0
        if rise_count >= 3 and not announced:  # 3 consecutive passes
            announce()
            announced = True
            rise_count = 0
    else:
        fall_count += 1
        rise_count = 0
        if fall_count >= 2 and announced:  # 2 consecutive failures
            withdraw()
            announced = False
            fall_count = 0

    time.sleep(5)

Why: Avoids BGP churn from momentary failures.


Pitfall #10: Hardcoded Paths

❌ Wrong:

#!/usr/bin/env python3
# Hardcoded path won't work on other systems
import sys
sys.path.append('/home/alice/myproject')

βœ… Correct:

#!/usr/bin/env python3
import sys
import os

# Use relative paths or environment variables
script_dir = os.path.dirname(os.path.realpath(__file__))
sys.path.append(os.path.join(script_dir, 'lib'))

BGP Protocol Issues

Pitfall #11: Mismatched AS Numbers

❌ Wrong:

# ExaBGP config
neighbor 192.0.2.1 {
    local-as 65001;
    peer-as 65002;  # Says peer is AS 65002
}

# But peer router is actually configured as AS 65000!

Problem: BGP session won't establish. Logs show OPEN message error.

βœ… Solution: Verify peer's ASN:

# Check peer's actual ASN
show bgp summary  # On router

Ensure peer-as in ExaBGP matches peer's actual local-as.


Pitfall #12: Incorrect Router ID

❌ Wrong:

neighbor 192.0.2.1 {
    router-id 192.0.2.1;  # Same as neighbor!
}

neighbor 192.0.2.2 {
    router-id 192.0.2.1;  # Same router-id for different neighbors!
}

Problem: BGP router-id must be unique per ExaBGP instance, not per neighbor.

βœ… Correct:

# Use same router-id for all neighbors (but unique per ExaBGP instance)
neighbor 192.0.2.1 {
    router-id 192.0.2.100;  # ExaBGP's unique ID
}

neighbor 192.0.2.2 {
    router-id 192.0.2.100;  # Same router-id
}

Rule: One router-id per ExaBGP process, unique across your network.


Pitfall #13: TCP MD5 Password Mismatch

❌ Wrong:

neighbor 192.0.2.1 {
    tcp {
        md5-password "secret123";
    }
}

# But peer router has "secret456"

Problem: TCP connection fails silently. No BGP session.

βœ… Solution:

neighbor 192.0.2.1 {
    tcp {
        md5-password "secret456";  # Must match peer exactly
    }
}

Verification:

# On peer router
show bgp neighbors 192.0.2.2 | include password

# Check logs for TCP connection refused
exabgp -d config.ini

Pitfall #14: Route Filtering on Peer

❌ Issue:

# Announced route
print("announce route 100.64.1.0/24 next-hop self")

But peer router has:

# Cisco IOS-XR
router bgp 65000
 neighbor 192.0.2.2
  address-family ipv4 unicast
   route-policy BLOCK-ALL in  # Blocks everything!

Problem: Routes announced but peer rejects them via import policy.

βœ… Solution: Verify peer's import filters:

# Check peer's import policy
show bgp neighbor 192.0.2.2 policy

# Or allow ExaBGP routes
route-policy ALLOW-EXABGP
  if as-path passes-through '65001' then
    pass
  endif
end-policy

Performance Problems

Pitfall #15: Excessive Health Check Frequency

❌ Wrong:

while True:
    check_health()  # Every 100ms!
    time.sleep(0.1)

Problem: Excessive CPU usage, doesn't improve convergence (BGP propagation takes seconds anyway).

βœ… Correct:

while True:
    check_health()
    time.sleep(5)  # 5-10 seconds is reasonable

Why: BGP convergence typically 5-15 seconds. Checking every 100ms wastes resources.


Pitfall #16: Not Using Route Reflectors

❌ Wrong (Full Mesh iBGP):

100 ExaBGP instances
↓
100 Γ— 99 / 2 = 4,950 BGP sessions!

βœ… Correct (Route Reflector):

100 ExaBGP instances
↓
100 sessions to 2 Route Reflectors
= 100 sessions total

Solution: Use BGP Route Reflectors for large deployments (>10 speakers).


Pitfall #17: Announcing Too Many Routes

❌ Wrong:

# Announcing /32 for every IP in /24
for i in range(1, 255):
    print(f"announce route 100.64.1.{i}/32 next-hop self")

Problem: Unnecessary churn, large BGP table, slow convergence.

βœ… Correct:

# Announce aggregate
print("announce route 100.64.1.0/24 next-hop self")

Rule: Aggregate when possible. Only announce /32 for anycast or specific services.


Security Mistakes

Pitfall #18: No BGP Authentication

❌ Wrong:

neighbor 192.0.2.1 {
    # No authentication!
}

Problem: Anyone who can reach your ExaBGP can inject routes.

βœ… Correct:

neighbor 192.0.2.1 {
    tcp {
        md5-password "strong-random-password-here";
    }
}

Better (TCP-AO):

neighbor 192.0.2.1 {
    tcp {
        ao-keyid 1;
        ao-key "hex:deadbeef...";
    }
}

Pitfall #19: Running API Process as Root

❌ Wrong:

process route-injector {
    run /root/inject.py;  # Runs as root!
}

Problem: If your script has vulnerabilities, attacker gets root access.

βœ… Correct:

process route-injector {
    run /opt/exabgp/inject.py;
    user exabgp;  # Run as unprivileged user
    env {
        exabgp.user = exabgp;
    }
}
# Create unprivileged user
useradd -r -s /bin/false exabgp
chown exabgp:exabgp /opt/exabgp/inject.py

Pitfall #20: Exposing ExaBGP API

❌ Wrong:

# ExaBGP listening on all interfaces
exabgp --bind 0.0.0.0:179 config.ini

Problem: Anyone on network can connect to BGP port.

βœ… Correct:

# Bind to localhost or specific interface only
exabgp --bind 127.0.0.1:179 config.ini

# Or use firewall
iptables -A INPUT -p tcp --dport 179 -s 192.0.2.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 179 -j DROP

Deployment Issues

Pitfall #21: No Logging

❌ Wrong:

#!/usr/bin/env python3
# No logging at all
if check_health():
    announce()

Problem: Impossible to troubleshoot when things go wrong.

βœ… Correct:

#!/usr/bin/env python3
import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s %(levelname)s: %(message)s',
    filename='/var/log/exabgp-health.log'
)

if check_health():
    logging.info("Health check passed, announcing route")
    announce()
else:
    logging.warning("Health check failed, withdrawing route")
    withdraw()

Pitfall #22: Not Monitoring BGP Session State

❌ Wrong: "I announced routes, they must be working."

Problem: BGP session might be down, routes not actually advertised.

βœ… Correct:

# Parse BGP state messages from ExaBGP
def handle_state(msg):
    state = msg.get('neighbor', {}).get('state')
    if state == 'down':
        logging.error("BGP session down!")
        # Alert ops team
    elif state == 'up':
        logging.info("BGP session established")

# In your receiver loop
if msg.get('type') == 'state':
    handle_state(msg)

Better: Use external monitoring (Prometheus, Grafana) to track BGP state.


Pitfall #23: No Graceful Shutdown

❌ Wrong:

# Kill ExaBGP immediately
kill -9 $(pidof exabgp)

Problem: Routes withdrawn abruptly, traffic drops.

βœ… Correct:

# Graceful shutdown
kill -TERM $(pidof exabgp)

# Or withdraw routes first
echo "withdraw route 100.64.1.0/24 next-hop self" | \
    socat - UNIX-CONNECT:/run/exabgp/exabgp.sock

sleep 30  # Wait for BGP convergence
systemctl stop exabgp

Version-Specific Pitfalls

Pitfall #24: Not Reading ACK Responses (Hanging Programs)

❌ Wrong (program hangs):

import sys

# Send command
print("announce route 100.64.1.0/24 next-hop self")
sys.stdout.flush()

# Program hangs here because ACK is enabled by default!
# ExaBGP sends "done\n" but we never read it
# This causes backpressure and eventually hangs

Problem: ACK is enabled by default in ExaBGP 4.x and 5.x. If you don't read responses, the pipe fills up and blocks.

βœ… Solution 1 - Read ACK responses (recommended):

import sys
import select
import time

def wait_for_ack(expected_count=1, timeout=30):
    """
    Wait for ACK responses with polling loop.
    Handles both text and JSON encoder formats.
    """
    import json
    received = 0
    start_time = time.time()

    while received < expected_count:
        if time.time() - start_time >= timeout:
            return False

        ready, _, _ = select.select([sys.stdin], [], [], 0.1)
        if ready:
            line = sys.stdin.readline().strip()

            # Parse response (could be text or JSON)
            answer = None
            if line.startswith('{'):
                try:
                    data = json.loads(line)
                    answer = data.get('answer')
                except:
                    pass
            else:
                answer = line

            if answer == "done":
                received += 1
            elif answer == "error":
                return False
            elif answer == "shutdown":
                raise SystemExit(0)
        else:
            time.sleep(0.1)

    return True

# Send command
sys.stdout.write("announce route 100.64.1.0/24 next-hop self\n")
sys.stdout.flush()

# Wait for ACK (with polling loop)
if not wait_for_ack():
    sys.exit(1)  # Command failed

βœ… Solution 2 - Disable ACK (simpler but no error feedback):

# Option A: Environment variable (4.x and 5.x)
export exabgp.api.ack=false
exabgp /etc/exabgp/exabgp.conf

# Option B: Runtime command (5.x/main only)
# Send: disable-ack or silence-ack

See: ACK Feature Documentation for details.


See Also

Documentation

Getting Help

Quick Fixes

Session won't establish?

  1. Check ASNs match (local-as, peer-as)
  2. Check router-id is unique
  3. Check TCP MD5 password matches
  4. Verify network connectivity (ping, tcpdump)

Routes announced but not working?

  1. Verify peer accepts routes (show bgp neighbor received-routes)
  2. Check next-hop is reachable from peer
  3. Verify service IP is configured locally (ip addr show)
  4. Check peer's import filters

Health checks flapping?

  1. Add dampening (rise/fall counters)
  2. Increase health check interval
  3. Check health check logic (timeouts, retries)

πŸ‘» Ghost written by Claude (Anthropic AI)

Clone this wiki locally