Skip to content

Conversation

@aitestino
Copy link
Member

@aitestino aitestino commented Nov 8, 2025

📝 Description

This PR adds comprehensive velocity metrics support to the github-ops-manager tool, enabling detailed analysis of user activity and commit statistics across repositories. The implementation includes retry mechanisms for GitHub API rate limiting and a new user repository discovery system using the GitHub Search API.

🔗 Related Issue(s)

  • N/A - New feature development

✔️ Type of Change

  • 🚀 New feature (non-breaking change which adds functionality)
  • 🛠️ Refactoring/Technical debt (internal code improvements with no direct user-facing changes)
  • ⚙️ Chore (build process, CI, tooling, dependencies, etc.)

🎯 Key Changes & Implementation Details

New Features

  • Velocity Metrics Support: Added get_commit_stats() method to GitHubKitAdapter for detailed commit statistics retrieval
  • User Repository Discovery: New UserRepositoryDiscoverer class that uses GitHub Search API to find all repositories where a user has activity (PRs, issues, reviews, commits)
  • Retry Decorator: Implemented @retry_on_rate_limit() decorator with exponential backoff for graceful GitHub API rate limit handling
  • Branch Management: Added list_branches() method to GitHubKitAdapter for comprehensive branch listing

Infrastructure Improvements

  • Search Rate Limiting: Custom SearchRateLimiter class to handle Search API's stricter 30 req/min limit
  • Error Handling: New UserNotFoundError exception for better handling of non-existent GitHub users (common with LDAP-only users)
  • Enhanced Logging: Comprehensive structured logging throughout velocity metrics components

Code Quality

  • Added PR template tailored to this repository
  • Added CLAUDE.md with Python development guidelines
  • Fixed all linting issues (exception naming, type annotations, exception chaining)
  • All 197 unit tests passing
  • All pre-commit checks passing

🔧 GitHub Operations Impact

  • Changes to GitHub API authentication (App/PAT) - Enhanced rate limit handling
  • N/A - No breaking changes to existing GitHub operations

🧪 Testing Done

  • All existing tests pass (uv run pytest tests/unit)
  • All pre-commit checks pass (ruff linting, formatting, yaml validation)
  • 197 unit tests passing in 2.08s
  • Tested with GitHub App authentication
  • Manual E2E testing performed (describe scenarios):
    • Scenario 1: Pending - User repository discovery across date ranges
    • Scenario 2: Pending - Commit statistics retrieval for velocity metrics
    • Scenario 3: Pending - Rate limit retry behavior validation

📋 Configuration Changes

  • Changes to pyproject.toml (dependencies, project metadata, tool config)
  • N/A - No configuration changes required from users

✅ Checklist

  • My code follows the style guidelines of this project (e.g., ran pre-commit run -a).
  • I have added comprehensive Google-style docstrings to all new/modified functions, classes, and methods.
  • I have added appropriate type hints to all function parameters and return values.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation (README, docstrings, etc.).
  • My changes generate no new warnings from ruff check . or mypy ..
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have checked that my changes don't introduce security vulnerabilities (API key exposure, injection attacks, etc.).
  • I have verified imports are properly organized (standard library, third-party, local application).
  • I have verified that no sensitive data is logged (credentials, tokens, PII).

🔄 Backward Compatibility

  • This change is fully backward compatible

All new functionality is additive and does not modify existing APIs or workflows.

➡️ Next Steps (if applicable)

  1. Add comprehensive unit tests for velocity metrics components
  2. Add integration tests for user repository discovery
  3. Update README.md with velocity metrics usage examples
  4. Add CLI commands for velocity metrics queries (if needed)

❓ Questions & Open Items (if applicable)

  • Should we expose velocity metrics as CLI commands or keep as programmatic API only?
  • Rate limit defaults seem appropriate (30 req/min for Search API) - adjust if needed based on usage patterns

aitestino and others added 16 commits August 28, 2025 16:41
…methods

- Add @retry_on_rate_limit decorators to all GitHub API methods for robust rate limit handling
- Add list_organization_repositories() method for org-wide repository fetching
- Add list_commits() method with filtering support for commit statistics
- Add list_pull_request_reviews() method for PR review analysis
- Export retry_on_rate_limit decorator in utils package

These enhancements support the AutomationVelocityMetrics project requirements.
- Add get_commit_stats() method to GitHubKitAdapter for retrieving commit line counts
- Returns additions, deletions, total changes, and list of changed files
- Includes comprehensive logging and error handling with retry decorators
- Supports AutomationVelocityMetrics Phase 6.5.1.1 requirement for commit statistics
Added enhanced PR file statistics method with consistent formatting
and guaranteed statistics for all files in pull requests.
Add .claude/ and logs/ directories to .gitignore to exclude
development tools and log files from version control.
Add comprehensive user activity discovery functionality that identifies
all repositories where a user has been active. Includes search for
authored PRs, reviewed PRs, created issues, and commits with proper
rate limiting for the Search API (30 requests/minute).

- UserRepositoryDiscoverer: Main discovery orchestrator
- SearchRateLimiter: Handles stricter Search API rate limits
- Comprehensive activity detection across multiple GitHub actions
- Proper error handling and retry logic integration
- Add comprehensive logging for authored PR searches
- Add debug logging for PR item structure analysis
- Fix retry decorator calls to use proper syntax
- Remove incorrect async/await from synchronous API calls
- Add response logging for search API results
Change 422 error handling from error to warning level when users
are not found on GitHub, which is normal behavior for LDAP users
who may not have GitHub accounts.
Add list_branches method with the following capabilities:
- Retrieves all branches for a repository with pagination support
- Optional filtering for protected vs unprotected branches
- Returns structured branch data including name, commit SHA, and protection status
- Includes retry logic for rate limiting via @retry_on_rate_limit decorator
- Comprehensive logging with owner, repo, and branch count details
- Handles edge cases where branch.commit or branch.protected may not exist

Method signature supports:
- protected: Optional boolean to filter by protection status
- per_page: Configurable page size (max 100, default 100)
- **kwargs: Additional API parameters for flexibility

Returns list of dictionaries with standardized branch information format.
Enhance commit search and repository discovery logging with:

Commit Search Improvements:
- Add INFO level logging before attempting commit search for better visibility
- Include discovered repository names (first 20) in commit search results
- Add error_type to exception logging for better error classification
- Preserve existing username context in all log messages

Repository Discovery Summary:
- Add comprehensive final summary with sorted list of all discovered repositories
- Maintain existing date range logging for audit trails
- Improve debugging visibility without impacting performance

These changes provide better observability for troubleshooting user discovery
issues and understanding the scope of repository detection across different
search methods (profile, organizations, commits).
- Add UserNotFoundException class for 422 errors when users don't exist
- Export UserNotFoundException in search module __init__.py
- Improve error handling in user discovery to properly handle non-existent users
- Raise specific exception instead of generic warnings for missing users
• Fix string quote consistency (single to double quotes)
• Remove unnecessary spacing and improve whitespace consistency
• Consolidate parameter formatting in method calls
• Clean up logging statement formatting
• Remove redundant debugging blocks for commit counting issue
• Reorder imports to follow alphabetical convention
• Update SearchRateLimiter, UserNotFoundException, UserRepositoryDiscoverer order
• Update import ordering and remove unused Optional import
• Fix string quote consistency (single to double quotes)
• Improve method parameter formatting and spacing
• Consolidate multi-line logging statements for better readability
• Update type hints to use modern union syntax (str | None)
• Clean up exception handling and method call formatting
• Remove unused Union import from typing
• Improve method parameter and condition formatting
• Consolidate multi-line conditional expressions for better readability
• Fix trailing whitespace and improve overall code consistency
• Simplify error message formatting in sync wrapper function
…ator

Add explicit handling for PrimaryRateLimitExceeded and SecondaryRateLimitExceeded exceptions from GitHubKit. These exceptions are now caught before the generic RequestFailed handler and processed to extract retry_after timedelta values for proper backoff timing.

- Import PrimaryRateLimitExceeded and SecondaryRateLimitExceeded from githubkit.exception
- Add exception handler for rate limit exceptions before RequestFailed catch block
- Extract retry_after timedelta from exceptions and use for wait time calculation
- Maintain exponential backoff behavior with max_delay constraints
- Add structured logging with rate limit type and wait time details
- Prevents Python tracebacks from appearing when GitHub rate limits are hit
@aitestino aitestino added the enhancement New feature or request label Nov 8, 2025
- Add comprehensive pull request template at .github/pull_request_template.md with sections for GitHub operations impact, testing checklist, and configuration changes
- Fix exception naming convention by renaming UserNotFoundException to UserNotFoundError in user_discovery.py
- Add return type annotations (-> None) to __init__ methods in user_discovery.py
- Improve exception handling by adding proper exception chaining with 'from None'
- Update search module exports to reflect renamed exception class
- Add project-specific CLAUDE.md guidelines file

All pre-commit checks pass. All 197 unit tests pass.
@aitestino aitestino changed the title Feat/velocity metrics support feat: add velocity metrics support with user repository discovery and rate limit handling Nov 8, 2025
@aitestino aitestino added the documentation Improvements or additions to documentation label Nov 8, 2025
- Changed default max_retries parameter to 100
- Updated docstring to reflect new default value
- Enables more persistent retry behavior for large batch processing
- Still honors all rate limit headers and backoff requirements
Enables mypy and other type checkers to recognize and validate type hints
in the github_ops_manager package when used as a dependency.
Include search subpackage and all Python files in the build to fix
missing module issues when installed as a dependency.
Move include directive to proper hatch.build section to ensure
py.typed marker is packaged with the distribution.
…erations

Add comprehensive fallback strategies for commit stats and PR file fetching to maximize compatibility across GitHub instances and handle edge cases gracefully.

Changes:

Commit Stats Multi-Path Resilience:
- Add _extract_stats_from_commit_data() helper method to extract and validate stats from API responses following DRY principle
- Add _get_commit_stats_method_1_rest_api() using standard REST API as primary method
- Add _get_commit_stats_method_2_compare_api() using parent commit comparison as fallback
- Add _get_commit_stats_method_3_single_commit_compare() using ~1 notation as final fallback
- Refactor get_commit_stats() to orchestrate all three methods with graceful degradation

PR Files Multi-Path Resilience:
- Add _normalize_pr_file_data() helper method to normalize file data from various API sources following DRY principle
- Add _get_pr_files_method_1_standard_api() using standard PR files API as primary method
- Add _get_pr_files_method_2_from_commits() aggregating files from PR commits as fallback
- Add _get_pr_files_method_3_from_compare() using compare API between base and head as final fallback
- Refactor get_pull_request_files_with_stats() to orchestrate all three methods

Design Principles:
- DRY: Extract common logic into dedicated helper methods
- SRP: Each method has single, well-defined responsibility
- Graceful degradation: Try multiple methods in sequence, return safe empty values if all fail
- API-agnostic: Handle differences between GitHub.com and GitHub Enterprise Server
- Comprehensive logging: Log each method attempt, success, and failure for troubleshooting
Resolved TypeError in _get_pr_files_method_2_from_commits caused by githubkit library bug.

Problem:
- Method 2 (commit aggregation) was failing with Pydantic validation error
- The githubkit library returns commit data that includes a verification object
- This verification object is missing the required 'verified_at' field
- Using parsed_data triggers Pydantic validation which fails on this incomplete data

Solution:
- Changed from commits_response.parsed_data to commits_response.json()
- This bypasses Pydantic validation and works directly with raw JSON
- Updated SHA extraction to handle dict objects with safe get() method
- Added isinstance check for defensive dict access

Impact:
- Method 2 can now successfully aggregate files from PR commits
- Provides reliable fallback when Method 1 (list_files endpoint) fails
- Works around upstream githubkit validation bug until library is fixed
Resolved TypeError in _get_pr_files_method_3_from_compare caused by incorrect parameter names.

Problem:
- Method 3 (compare API) was failing with TypeError: unexpected keyword argument 'base'
- The code was passing separate base= and head= parameters
- GitHub's compare_commits endpoint requires a single basehead parameter
- The basehead parameter must be formatted as "BASE...HEAD" (three dots)

Solution:
- Removed separate base and head parameters
- Created single basehead parameter with format: f"{base_sha}...{head_sha}"
- This matches GitHub API specification: /repos/{owner}/{repo}/compare/{basehead}
- Added explanatory comment documenting the correct format

Technical Details:
- The three-dot notation (BASE...HEAD) compares the common ancestor
- This is the standard format for GitHub's comparison endpoint
- The githubkit library expects this as a single string parameter

Impact:
- Method 3 can now successfully fetch PR files via compare API
- Provides final fallback when Methods 1 and 2 fail
- Completes the multi-path resilient fetching strategy
@aitestino aitestino self-assigned this Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants