This directory contains all testing, monitoring, and development utilities for the crawler.
testing/
├── monitoring/ # Scripts for monitoring running system
├── setup/ # Setup and configuration utilities
├── data/ # Test data server and utilities
├── *.py # Main test scripts
└── run_k8s_test.sh # Main test runner
./testing/run_k8s_test.sh./testing/data/start_test_data_server.shcheck_status.py- Check crawler system statusmonitor_progress.py- Monitor crawling progressmonitor_queue.py- Monitor queue statusdebug_worker.py- Debug worker processes
test_dynamic_updates.py- Test dynamic file updatestest_file_removal.py- Test file removal handlingtest_job_recovery.py- Test job recovery mechanismslaunch_test.py- Launch test scenarios
setup_azure_servicebus.sh- Set up Azure Service Bussetup_azure_servicebus_aad.sh- Set up Service Bus with AADsetup_env.sh- Configure environment variablesinstall_pyodbc_mac.sh- Install ODBC drivers on Macremove_fk_constraint.py- Database migration utility
test_data_server.py- Python server for test datastart_test_data_server.sh- Start the test data server
local-setup.sh- Set up local testing environmentrun_k8s_test.sh- Run tests with Kubernetes-like setup
start_api_server.py- Start API server for testingstart_data_server.py- Start data server for testingstart_worker.py- Start worker for testingrun.py- Legacy launcher (deprecated)
# Ensure .env is configured with Azure credentials
./testing/run_k8s_test.sh# Set up local environment
./testing/local-setup.sh
# Start test data server
./testing/data/start_test_data_server.sh
# Run specific test
python3 testing/test_dynamic_updates.py# Check overall status
python3 testing/monitoring/check_status.py
# Monitor queue
python3 testing/monitoring/monitor_queue.py
# Watch progress
python3 testing/monitoring/monitor_progress.pyTest data is stored in /data directory with sample schema.org files for:
- backcountry_com
- hebbarskitchen_com
- imdb_com
- tripadvisor_com
Tests require environment variables configured in .env:
- Azure Service Bus credentials
- SQL Database connection
- Storage account details
- Azure AD credentials (if using AAD auth)
See .env.example for required variables.