Skip to content

Latest commit

ย 

History

History
371 lines (272 loc) ยท 9.3 KB

File metadata and controls

371 lines (272 loc) ยท 9.3 KB

API Capture Tool ๐Ÿ”

A sophisticated, enterprise-grade tool for automatically capturing and categorizing backend API endpoints from web applications. Built with TypeScript and following SOLID principles for maximum maintainability and extensibility.

๐ŸŽฏ Project Objectives

Primary Goals

  • Automated API Discovery: Automatically navigate through web applications and capture all backend API calls
  • Intelligent Categorization: Organize captured endpoints based on application structure and modules
  • Structured Output: Generate hierarchical JSON files mirroring the application's URL structure
  • Enterprise Ready: Robust error handling, configuration management, and extensible architecture

Key Features

  • ๐Ÿš€ Playwright-powered browser automation
  • ๐Ÿ—๏ธ SOLID principles architecture
  • ๐Ÿ“ Hierarchical file output matching URL structure
  • ๐Ÿ” Authentication support for secured applications
  • ๐ŸŽฏ Smart endpoint categorization with fallback inference
  • โšก Configurable timeouts and capture parameters
  • ๐Ÿ›ก๏ธ Comprehensive error handling and logging

๐Ÿš€ Installation

Prerequisites

  • Node.js 16.0 or higher
  • npm or yarn package manager

Step-by-Step Setup

  1. Clone and Install Dependencies
git clone <repository-url>
cd api-capture-tool
npm install
  1. Install Playwright Browsers
npx playwright install
  1. Environment Configuration Create a .env file (optional):
API_CAPTURE_USERNAME=your_username
API_CAPTURE_PASSWORD=your_password
  1. Input File Setup Create the input structure:
mkdir -p Input

Place your microtec_erp_urls.json in the Input directory.

๐Ÿ“– Usage

Basic Execution

# Development mode
npm run dev

# Production build and run
npm run build
npm start

Configuration

The tool uses a hierarchical configuration system:

  1. Environment Variables (Highest priority)
  2. Configuration Service defaults
  3. Input JSON structure for URLs

Input JSON Format

{
  "base_url": "https://your-app.com",
  "modules": {
    "Module_Name": {
      "Section_Name": ["/url/path/1", "/url/path/2"],
      "Nested_Section": {
        "SubSection": ["/nested/path"]
      }
    }
  }
}

๐Ÿ—๏ธ Project Structure

src/
โ”œโ”€โ”€ core/                    # Domain Layer (SOLID Principles)
โ”‚   โ”œโ”€โ”€ interfaces/         # Abstraction contracts
โ”‚   โ”œโ”€โ”€ entities/          # Business objects
โ”‚   โ””โ”€โ”€ exceptions/        # Custom error types
โ”œโ”€โ”€ infrastructure/        # Technical Implementation
โ”‚   โ”œโ”€โ”€ browser/          # Playwright wrappers
โ”‚   โ”œโ”€โ”€ file-system/      # File operations
โ”‚   โ””โ”€โ”€ config/           # Configuration management
โ”œโ”€โ”€ application/          # Use Cases & Services
โ”‚   โ”œโ”€โ”€ services/        # Business logic
โ”‚   โ”œโ”€โ”€ use-cases/       # Application workflows
โ”‚   โ””โ”€โ”€ dtos/           # Data transfer objects
โ””โ”€โ”€ main/               # Composition & Entry Point
    โ””โ”€โ”€ composition-root.ts

๐Ÿ”„ Component Integration Flow

Architecture Overview

Input JSON โ†’ Composition Root โ†’ Use Case โ†’ Services โ†’ Output
    โ†“              โ†“              โ†“         โ†“         โ†“
URL Structure  Dependency     Business   Browser   JSON Files
               Injection      Logic      Automation

Detailed Integration Flow

  1. Initialization Phase

    main.ts โ†’ CompositionRoot โ†’ BrowserFactory โ†’ ConfigurationService
    
  2. URL Loading Phase

    Use Case โ†’ UrlRepository โ†’ FileSystemService โ†’ JSON Parser โ†’ UrlStructure Entities
    
  3. Authentication Phase

    Use Case โ†’ AuthenticationService โ†’ Browser Page โ†’ Login Flow
    
  4. API Capture Phase

    Use Case โ†’ ApiCaptureService โ†’ Browser Events โ†’ ApiEndpoint Entities
    
  5. Categorization Phase

    Use Case โ†’ UrlCategorizationService โ†’ UrlStructure Matching โ†’ CategorizedEndpoint Entities
    
  6. Output Phase

    Use Case โ†’ ApiEndpointRepository โ†’ FileSystemService โ†’ OrganizedEndpoints โ†’ JSON Files
    

๐ŸŽฏ Key Functions Explained

Core Business Logic

1. CaptureApiEndpointsUseCase.execute()

Purpose: Orchestrates the entire API capture workflow

async execute(): Promise<void> {
  1. Load URLs from repository
  2. Authenticate with application
  3. Capture APIs from all URLs
  4. Categorize endpoints by module/section
  5. Save organized results to file system
}

2. ApiCaptureService.captureApisFromUrls()

Purpose: Navigates through URLs and captures API requests

async captureApisFromUrls(urls: UrlStructure[]): Promise<ApiEndpoint[]> {
  for (const url of urls) {
    - Navigate to URL using Playwright
    - Wait for API calls with timeout
    - Capture unique endpoints via request listeners
    - Store in memory map to avoid duplicates
  }
  return Array.from(capturedEndpoints.values());
}

3. UrlCategorizationService.categorizeEndpoint()

Purpose: Intelligently categorizes endpoints based on source URL

categorizeEndpoint(endpoint: ApiEndpoint, urls: UrlStructure[]): CategorizedEndpoint {
  1. Find exact URL match in navigation structure
  2. If no match, infer from URL path segments
  3. Apply normalization rules (masterdata โ†’ Master_data)
  4. Handle special API patterns (SideMenu, CurrentUserInfo)
  5. Return categorized endpoint with module/section/subsection
}

Infrastructure Services

4. BrowserFactory.createBrowser()

Purpose: Initializes Playwright browser instance with proper configuration

async createBrowser(): Promise<IBrowserService> {
  - Launch Chromium in non-headless mode with devtools
  - Create new browser context
  - Return wrapped browser service for abstraction
}

5. UrlRepository.loadUrls()

Purpose: Parses input JSON and creates structured URL hierarchy

async loadUrls(): Promise<UrlStructure[]> {
  - Read and validate JSON file
  - Recursively parse module structure
  - Create UrlStructure entities
  - Sort by URL length for specificity matching
}

Entity Models

6. OrganizedEndpoints.toJSON()

Purpose: Transforms internal data structure to serializable JSON

toJSON(): any {
  - Convert Map-based structure to plain objects
  - Transform entities to DTOs
  - Maintain hierarchical module/section/subsection structure
  - Ensure proper JSON serialization
}

๐Ÿ“Š Output Structure

The tool generates a hierarchical file structure:

08new_api_endpoints_output/
โ”œโ”€โ”€ all_endpoints.json                    # Complete endpoint catalog
โ”œโ”€โ”€ General_Settings/                     # Module directory
โ”‚   โ”œโ”€โ”€ General_Settings_endpoints.json   # Module-level endpoints
โ”‚   โ”œโ”€โ”€ Dashboard/                        # Section directory
โ”‚   โ”‚   โ”œโ”€โ”€ Dashboard_endpoints.json      # Section-level endpoints
โ”‚   โ”‚   โ””โ”€โ”€ SideMenu/                     # Subsection directory
โ”‚   โ””โ”€โ”€ Master_data/
โ”œโ”€โ”€ Accounting/
โ””โ”€โ”€ ... (other modules)

Output JSON Format

{
  "General_Settings": {
    "Dashboard": {
      "SideMenu": [
        {
          "method": "GET",
          "endpoint": "/api/menu",
          "sourcePage": "/erp/dashboard",
          "timestamp": "2024-01-15T10:30:00.000Z"
        }
      ]
    }
  }
}

๐Ÿ”ง Advanced Configuration

Timeout Settings

// In ConfigurationService
CAPTURE_TIMEOUT: 15000,      // Wait for APIs per page
NAVIGATION_TIMEOUT: 60000,   // Page load timeout

API Filtering

// Only capture requests matching:
- URL starts with TARGET_API_PREFIX
- Resource type is "xhr" or "fetch"
- Unique method-URL combinations

๐Ÿ› Troubleshooting

Common Issues

  1. Login Failures

    • Verify credentials in configuration
    • Check network connectivity to target application
    • Update CSS selectors if login form changes
  2. No APIs Captured

    • Verify TARGET_API_PREFIX matches backend domain
    • Check if APIs are triggered on page load
    • Increase CAPTURE_TIMEOUT for slower applications
  3. File System Errors

    • Ensure write permissions in output directory
    • Verify input JSON file exists and is valid

Debug Mode

Enable verbose logging by setting environment variable:

DEBUG_API_CAPTURE=true npm run dev

๐Ÿš€ Performance Optimization

Memory Management

  • Uses Map for O(1) endpoint lookups
  • Automatic browser resource cleanup
  • Streamed file writing for large datasets

Capture Efficiency

  • Parallelizable URL processing
  • Smart deduplication of endpoints
  • Configurable timeouts per environment

๐Ÿ”ฎ Extension Points

The architecture supports easy extensions:

  1. New Browser Support: Implement IBrowserFactory
  2. Additional Output Formats: Implement IApiEndpointRepository
  3. Custom Categorization: Extend IUrlCategorizationService
  4. Alternative Authentication: Implement IAuthenticationService

๐Ÿ“„ License

This project is designed for educational and legitimate testing purposes. Ensure you have proper authorization before using against any applications.


Built with โค๏ธ following SOLID principles for enterprise-grade reliability and maintainability.