Skip to content

Latest commit

 

History

History
361 lines (255 loc) · 10.4 KB

File metadata and controls

361 lines (255 loc) · 10.4 KB

Configuration Guide

This document describes all configuration options available for the Token API Scraper.

Environment Variables

The application can be configured via environment variables. Copy .env.example to .env and adjust the settings:

cp .env.example .env

Database Configuration

  • CLICKHOUSE_URL - ClickHouse database URL

    • Default: http://localhost:8123
    • Example: http://clickhouse.example.com:8123
  • CLICKHOUSE_USERNAME - ClickHouse username

    • Default: default
  • CLICKHOUSE_PASSWORD - ClickHouse password

    • Default: (empty)
  • CLICKHOUSE_DATABASE - ClickHouse database name

    • Default: default

RPC Configuration

  • NODE_URL - EVM RPC node URL (required)
    • Example: https://your-rpc-node.example.com

Token Metadata Overrides

The scraper can apply curated name and symbol values from an external tokens.json file (e.g. CoinGecko-sourced) to matching rows already stored in the metadata table. Overrides are applied once at service startup, before the normal metadata scrape loop begins.

  • TOKEN_OVERRIDES_URL - URL of a tokens.json file to fetch overrides from
    • Default: (not set — overrides disabled)
    • The file must be a JSON array of objects with network, contract, and at least one of name, symbol, or decimals
    • decimals is optional; when present and different from the stored value, the startup override row updates it too
    • For each matching (network, contract) already present in metadata, the scraper inserts a new replacement row with the curated override values
    • If an override token is not in metadata yet, the scraper inserts a startup row for it using the override values and defaults decimals to 18 when the override entry does not provide one
    • If the URL is unreachable at startup, the scraper leaves existing metadata unchanged and logs a warning

Performance Settings

Concurrency

  • CONCURRENCY - Number of concurrent RPC requests
    • Default: 10
    • Recommended range: 5-20, depending on RPC node capacity and network conditions
    • Higher values: Faster processing but may hit rate limits
    • Lower values: Slower but more conservative on RPC resources

Example:

# Set concurrency to 5 for conservative processing
CONCURRENCY=5 npm run start

Retry Configuration

The retry mechanism uses exponential backoff with jitter to handle transient RPC failures gracefully:

  • MAX_RETRIES - Maximum number of retry attempts for failed RPC requests

    • Default: 3
    • Controls how many times to retry a failed request
  • BASE_DELAY_MS - Base delay in milliseconds for exponential backoff between retries

    • Default: 400
    • Starting delay between retries, which grows exponentially
  • JITTER_MIN - Minimum jitter multiplier for backoff delay

    • Default: 0.7 (70% of backoff)
    • Add randomness to retry delays to prevent thundering herd
  • JITTER_MAX - Maximum jitter multiplier for backoff delay

    • Default: 1.3 (130% of backoff)
  • MAX_DELAY_MS - Maximum delay in milliseconds between retry attempts

    • Default: 30000 (30 seconds)
    • Cap on the maximum delay between retries
  • TIMEOUT_MS - Timeout in milliseconds for individual RPC requests

    • Default: 10000 (10 seconds)
    • How long to wait for a single RPC request before timing out

Example:

# More aggressive retry settings for unreliable networks
MAX_RETRIES=5 BASE_DELAY_MS=1000 MAX_DELAY_MS=60000 npm run start

Monitoring Configuration

Prometheus Metrics

Prometheus metrics are always enabled and provide real-time monitoring of service performance.

  • PROMETHEUS_PORT - Prometheus metrics HTTP port
    • Default: 9090
    • Port where metrics will be exposed

The Prometheus server starts automatically when a service runs and stays alive across service restarts for continuous monitoring.

Example:

# Prometheus metrics are always enabled on port 9090 by default
npm run start

# Specify a custom port
PROMETHEUS_PORT=8080 npm run start

Available Metrics:

  • scraper_total_tasks - Total number of tasks to process
  • scraper_completed_tasks_total - Total number of completed tasks (labeled by status: success/error)
  • scraper_error_tasks_total - Total number of failed tasks
  • scraper_requests_per_second - Current requests per second
  • scraper_progress_percentage - Current progress percentage
  • scraper_config_info - Configuration information (ClickHouse URL, database, node URL)

Access metrics at: http://localhost:9090/metrics (or your configured port)

Batch Insert Configuration

The batch insert mechanism improves ClickHouse insert performance by accumulating rows and inserting them in batches instead of one-by-one. This significantly reduces database overhead and improves throughput.

Batch inserts are always enabled to ensure optimal performance. You can configure the batching behavior with the following settings:

  • BATCH_INSERT_INTERVAL_MS - Flush interval in milliseconds

    • Default: 1000 (1 second)
    • How often to flush accumulated inserts to ClickHouse
    • Lower values: More frequent inserts, lower latency
    • Higher values: Larger batches, better throughput
  • BATCH_INSERT_MAX_SIZE - Maximum batch size before forcing a flush

    • Default: 10000 rows
    • Flush immediately when this many rows are accumulated
    • Prevents memory issues with large queues
    • Adjust based on available memory and row size

Example:

# Default batch settings (1 second, 10000 rows)
npm run start

# Custom batch settings for high-throughput scenarios
BATCH_INSERT_INTERVAL_MS=5000 BATCH_INSERT_MAX_SIZE=50000 npm run start

# Lower latency configuration (more frequent flushes)
BATCH_INSERT_INTERVAL_MS=500 BATCH_INSERT_MAX_SIZE=5000 npm run start

Batch insert benefits:

  • Improved insert throughput for large volumes of data
  • Reduced database overhead and network calls
  • Better resource utilization in high-concurrency scenarios

Auto-restart Options

Services automatically restart after successful completion in a continuous loop. The service runs in the same process without exiting, preserving Prometheus metrics across runs.

  • AUTO_RESTART_DELAY - Delay in seconds before restarting the service
    • Default: 10
    • Minimum: 1 second
    • Time to wait before restarting the service after completion

Example:

# Use default 10 second delay between restarts
npm run cli run metadata-transfers

# Custom 30 second delay between restarts
AUTO_RESTART_DELAY=30 npm run cli run metadata-swaps

# Combine with other options
AUTO_RESTART_DELAY=60 CONCURRENCY=20 npm run cli run metadata-transfers

Benefits of continuous auto-restart:

  • Process stays alive, avoiding overhead of process restarts
  • Prometheus metrics are preserved and accumulated across runs
  • Better for long-running monitoring scenarios
  • Simplified deployment (no need for external process managers)

Logging Options

  • VERBOSE - Enable verbose logging output

    • Default: false
    • Set to true to enable detailed console output
    • When disabled, services use structured logging only
    • Prometheus metrics are still computed regardless of this setting
  • LOG_TYPE - Type of log output format

    • Default: pretty
    • Options: pretty, json, hidden
    • Controls the format of structured logs from tslog
  • LOG_LEVEL - Minimum log level to display

    • Default: info
    • Options: debug, info, warn, error
    • Controls which log messages are displayed

Example:

# Run with verbose logging
VERBOSE=true npm run cli run metadata-transfers

# Run silently with structured logs (default)
npm run cli run metadata-transfers

# Run with JSON logs for production
LOG_TYPE=json LOG_LEVEL=warn npm run cli run metadata-transfers

Configuration Priority

Configuration values are applied in the following order (later values override earlier ones):

  1. Default values (hardcoded in the application)
  2. Environment variables (from .env file or system environment)
  3. Command-line flags (highest priority)

Example showing all three:

# .env file
CONCURRENCY=10

# Command-line flag overrides .env
npm run cli run metadata --concurrency 20

Configuration Files

.env File

Create a .env file in the project root with your configuration:

# Database
CLICKHOUSE_URL=http://localhost:8123
CLICKHOUSE_USERNAME=default
CLICKHOUSE_PASSWORD=secret
CLICKHOUSE_DATABASE=evm_data

# RPC
NODE_URL=https://tron-evm-rpc.publicnode.com

# Performance
CONCURRENCY=15
MAX_RETRIES=5

# Monitoring
PROMETHEUS_PORT=9090

Example Configurations

Development Setup

CLICKHOUSE_URL=http://localhost:8123
CLICKHOUSE_USERNAME=default
CLICKHOUSE_PASSWORD=
CLICKHOUSE_DATABASE=default
NODE_URL=https://tron-evm-rpc.publicnode.com
CONCURRENCY=5

Production Setup

CLICKHOUSE_URL=http://clickhouse-cluster:8123
CLICKHOUSE_USERNAME=scraper_user
CLICKHOUSE_PASSWORD=secure_password_here
CLICKHOUSE_DATABASE=production_evm
NODE_URL=https://your-tron-node.example.com
CONCURRENCY=20
MAX_RETRIES=5
BASE_DELAY_MS=500
MAX_DELAY_MS=60000
PROMETHEUS_PORT=9090
BATCH_INSERT_INTERVAL_MS=1000
BATCH_INSERT_MAX_SIZE=10000

Performance Tuning

Optimizing for Speed

For maximum throughput when you have a reliable RPC node:

CONCURRENCY=20
MAX_RETRIES=3
TIMEOUT_MS=5000

Optimizing for Reliability

For unstable networks or rate-limited RPC nodes:

CONCURRENCY=5
MAX_RETRIES=10
BASE_DELAY_MS=1000
MAX_DELAY_MS=120000
TIMEOUT_MS=30000

Balanced Configuration

A good middle ground for most use cases:

CONCURRENCY=10
MAX_RETRIES=5
BASE_DELAY_MS=400
MAX_DELAY_MS=30000
TIMEOUT_MS=10000

Troubleshooting Configuration Issues

High Error Rates

If you see many RPC errors:

  • Reduce CONCURRENCY to be less aggressive
  • Increase MAX_RETRIES for more retry attempts
  • Increase TIMEOUT_MS if requests are timing out
  • Check your NODE_URL is correct and accessible

Slow Performance

If processing is slower than expected:

  • Increase CONCURRENCY (if RPC node can handle it)
  • Decrease TIMEOUT_MS to fail faster
  • Verify network connectivity to RPC node
  • Check database performance

Connection Issues

If you can't connect to ClickHouse:

  • Verify CLICKHOUSE_URL is correct
  • Check firewall rules allow connection
  • Verify credentials (CLICKHOUSE_USERNAME, CLICKHOUSE_PASSWORD)
  • Test connection: curl http://localhost:8123/ping