Skip to content

Conversation

@tduhamel42
Copy link
Collaborator

Overview

This PR migrates FuzzForge from SARIF-based findings to a native findings format, with significant enhancements to CLI display and HTML export functionality.

Key Changes

🏗️ Core Architecture Changes

  • Native Findings Format - Introduced new native finding schema replacing SARIF dependency

    • New FindingSchema data model with required/optional fields
    • Simplified finding structure optimized for FuzzForge workflows
    • Maintains compatibility with SARIF export format
  • Database Refactoring - Updated SQLite schema for native format

    • New finding storage structure
    • Improved query performance
    • Better support for finding history tracking
  • Reporter Module Rewrite - Converted sarif_reporter.pynative_reporter.py

    • Native format as primary representation
    • SARIF export as output format option
    • All 10 analysis modules updated to use new create_finding signature

🎨 CLI Enhancements

  • Enhanced Findings Display

    • Rich table formatting with improved readability
    • Color-coded severity levels
    • Cleaner table columns (removed Rule and Confidence columns)
    • Better finding detail views
  • Modernized HTML Export

    • Interactive charts and filters using Chart.js
    • FuzzForge styling and color scheme
    • Responsive design with improved UX
    • Better finding organization and navigation
    • Enhanced finding page with complete metadata display

🐛 Critical Bug Fixes

  • File Handle Leaks (sdk/client.py:397, 484)

    • Fixed resource leaks in sync/async upload methods
    • Using context managers for proper cleanup
  • IndexError Fixes

    • OSS-Fuzz stats parsing (workers/ossfuzz/activities.py:372-376)
    • URL parsing in exception handling (sdk/exceptions.py:419-426)
    • Android workflow SARIF parsing (android_static_analysis/workflow.py:270)
  • CLI Command Issues

    • Fixed OptionInfo parameter handling in findings commands
    • Corrected command suggestions throughout CLI
    • Fixed --fail-on help text descriptions

Affected Components

  • Backend: Finding models, all analyzer modules, reporter module
  • CLI: Findings commands, database layer, HTML export
  • SDK: Client upload methods, exception handling
  • Workers: Android workflow, OSS-Fuzz activities
  • Workflows: All 10 workflows updated for new finding format

Testing

✅ All bug fixes tested and verified
✅ CLI commands working with new format
✅ HTML export renders correctly with new styling
✅ Finding creation and retrieval working across all modules

Breaking Changes

⚠️ Database schema change - Requires database migration or fresh .fuzzforge/ directory
⚠️ API changes - All create_finding() calls now use new signature

Migration Notes

Existing users should:

  1. Backup existing .fuzzforge/findings.db if needed
  2. Run ff init --force to reinitialize with new schema
  3. Re-run workflows to populate with native format findings

Related Issues

Addresses multiple issues identified in comprehensive code audit including critical resource leaks and error handling bugs.


Note: This PR supersedes the original feature/native-findings-format branch which had unrelated git history after commit rewriting. All changes have been cleanly cherry-picked onto current dev.

tduhamel42 and others added 16 commits November 14, 2025 10:51
Priority 1 implementation:
- Created native FuzzForge findings format schema with full support for:
  - 5-level severity (critical/high/medium/low/info)
  - Confidence levels
  - CWE and OWASP categorization
  - found_by attribution (module, tool, type)
  - LLM context tracking (model, prompt, temperature)

- Updated ModuleFinding model with new fields:
  - Added rule_id for pattern identification
  - Added found_by for detection attribution
  - Added llm_context for LLM-detected findings
  - Added confidence, cwe, owasp, references
  - Added column_start/end for precise location
  - Updated create_finding() helper with new required fields
  - Enhanced _generate_summary() with confidence and source tracking

- Fixed critical ID bug in CLI:
  - Changed 'ff finding show' to use --id (unique) instead of --rule
  - Added new show_findings_by_rule() function to show ALL findings matching a rule
  - Updated display_finding_detail() to support both native and SARIF formats
  - Now properly handles multiple findings with same rule_id

Breaking changes:
- create_finding() now requires rule_id and found_by parameters
- show_finding() now uses --id instead of --rule flag
- Renamed FindingRecord.sarif_data to findings_data
- Updated database schema: sarif_data column -> findings_data column
- Updated all database methods to work with native format:
  - save_findings()
  - get_findings()
  - list_findings()
  - get_all_findings()
  - get_aggregated_stats()

- Updated SQL queries to use native format JSON paths:
  - Changed from SARIF paths ($.runs[0].results) to native paths ($.findings)
  - Updated severity filtering from SARIF levels (error/warning/note) to native (critical/high/medium/low/info)

- Updated CLI commands to support both formats during transition:
  - get_findings command now extracts summary from both native and SARIF formats
  - show_finding and show_findings_by_rule updated to use findings_data field
  - Format detection to handle data from API (still SARIF) and database (native)

Breaking changes:
- Database schema changed - existing databases will need recreation
- FindingRecord.sarif_data renamed to findings_data
- Renamed sarif_reporter.py to native_reporter.py to reflect new functionality

- Updated WorkflowFindings model to use native format
  - Field name 'sarif' kept for API compatibility but now contains native format
  - Updated docstring to reflect native format usage

- Converted SARIFReporter to Native Reporter:
  - Module name changed from sarif_reporter to native_reporter (v2.0.0)
  - Updated metadata and input/output schemas
  - Removed SARIF-specific config (tool_name, include_code_flows)
  - Added native format config (workflow_name, run_id)

- Implemented native report generation:
  - Added _generate_native_report() method
  - Generates native FuzzForge format with full field support:
    - Unique finding IDs
    - found_by attribution (module, tool, type)
    - LLM context when applicable
    - Full severity scale (critical/high/medium/low/info)
    - Confidence levels
    - CWE and OWASP mappings
    - Enhanced location info (columns, snippets)
    - References and metadata

  - Added _create_native_summary() for aggregated stats
  - Summary includes counts by severity, confidence, category, source, and type
  - Tracks affected files count

- Kept old SARIF generation methods for reference
  - Will be moved to separate SARIF exporter module

Breaking changes:
- Reporter now outputs native format instead of SARIF
- Existing workflows using sarif_reporter will need updates
- Config parameters changed (tool_name -> workflow_name, etc.)
Priority 3 implementation:

**Improved Table Display:**
- Removed hardcoded 50-result limit
- Added pagination with --limit and --offset parameters
- New table columns: Finding ID (8 chars), Confidence (H/M/L), Found By
- Supports both native and SARIF formats with auto-detection
- Proper severity ordering (critical > high > medium > low > info)
- Pagination footer showing "Showing X-Y of Z results"

**Syntax Highlighting:**
- Added syntax-highlighted code snippets using Rich's Syntax
- Auto-detects language from file extension (20+ languages supported)
- Line numbers with correct start line from finding location
- Monokai theme for better readability

**Enhanced Detail View:**
- Confidence indicators with emoji (🟢 High, 🟡 Medium, 🔴 Low)
- Type-specific badges (🤖 LLM, 🔧 Tool, 🎯 Fuzzer, 👤 Manual)
- LLM context display with model name and prompt preview
- Better formatted found_by info with module and type
- Added suggestion to view all findings with same rule
- Cleaner recommendation display with 💡 icon

**New Commands:**
- Added 'ff findings by-rule' command to show all findings matching a rule
- Registered as @app.command("by-rule")

**Updated Related Commands:**
- all_findings: Updated to use 5-level severity (critical/high/medium/low/info)
- Table columns changed from Error/Warning/Note to Critical/High/Medium/Low
- Summary panel updated with proper severity mapping
- Support for both native and SARIF format findings_data

**Breaking Changes:**
- Severity display changed from 3-level (error/warning/note) to 5-level
- Table structure modified with new columns
- Old SARIF-only views deprecated in favor of format-agnostic displays
Fixed broken import after renaming sarif_reporter.py to native_reporter.py
Updated 10 modules to use the new create_finding() signature with required rule_id and found_by parameters:

- llm_analyzer.py: Added FoundBy and LLMContext for AI-detected findings
- bandit_analyzer.py: Added tool attribution and moved CWE/confidence to proper fields
- security_analyzer.py: Updated all three finding types (secrets, SQL injection, dangerous functions)
- mypy_analyzer.py: Added tool attribution and moved column info to column_start
- mobsf_scanner.py: Updated all 6 finding types (permissions, manifest, code analysis, behavior) with proper line number handling
- opengrep_android.py: Added tool attribution, proper CWE/OWASP formatting, and confidence mapping
- dependency_scanner.py: Added pip-audit attribution for CVE findings
- file_scanner.py: Updated both sensitive file and enumeration findings
- cargo_fuzzer.py: Added fuzzer type attribution for crash findings
- atheris_fuzzer.py: Added fuzzer type attribution for Python crash findings

All modules now properly track:
- Finding source (module, tool name, version, type)
- Confidence levels (high/medium/low)
- CWE and OWASP mappings where applicable
- LLM context for AI-detected issues
Aligns main.py with the updated findings.py command that changed from --rule to --id for finding lookups by unique UUID.
Removes the Confidence column from the findings table display to eliminate
confusion with the Severity column (both used High/Medium/Low terminology).

Changes:
- Removed 'Conf' column from table structure
- Removed confidence extraction logic for both native and SARIF formats
- Removed confidence badge creation and styling
- Table now shows: ID | Severity | Rule | Message | Found By | Location

Confidence data is still available in detailed finding view (ff finding show).
Removes the Rule column from findings table to simplify the view and reduce
redundancy with the Message column. Rule ID is still available in:
- Detailed finding view (ff finding show <run-id> --id <finding-id>)
- By-rule grouping command (ff findings by-rule <run-id> --rule <rule-id>)

Changes:
- Removed 'Rule' column from table structure
- Removed rule_text extraction and styling logic
- Expanded Message column from 35 to 50 chars (more space available)
- Expanded Location column from 18 to 20 chars
- Table now shows: ID | Severity | Message | Found By | Location

Benefits:
- Cleaner, more scannable table
- Message column has more room to show details
- Less visual clutter while maintaining all functionality
- Rewrite export_to_html with Bootstrap 5 styling
- Add Chart.js visualizations (severity, type, category, source)
- Add executive summary dashboard with stat cards
- Add interactive filtering by severity, type, and search
- Add sortable table columns
- Add expandable row details with full finding information
- Add Prism.js syntax highlighting for code snippets
- Display LLM context, confidence, CWE/OWASP, recommendations
- Make responsive with print-friendly CSS
- Update extract_simplified_findings to handle native format
- Update export_to_csv to handle native format with more fields
- Fix export functions to use findings_data instead of sarif_data
- Add safe_escape helper to handle None values
- Fix OptionInfo bug causing 'ff finding <run_id>' to crash
  - Add explicit limit=None, offset=0 parameters in main.py calls
  - Prevents OptionInfo objects from being used in arithmetic operations

- Fix command suggestions after workflow completion
  - Change 'fuzzforge findings' to 'ff finding' (correct syntax)
  - Add missing 'View findings' suggestion after submission

- Fix --fail-on help text
  - Change from 'severity' to 'SARIF level' (error,warning,note,info)
  - Matches actual implementation

- Update CLI documentation
  - Fix 'ff finding show' parameter from --rule to --id
  - Mark unimplemented AI commands as 'Coming Soon'
  - Correct 'ff ingest' documentation to match actual implementation
  - Remove fake subcommands, document actual options
Fixed multiple critical bugs identified during comprehensive code audit:

**Critical Fixes:**
- Fix file handle leaks in SDK client upload methods (sync and async)
  - Use context managers to ensure file handles are properly closed
  - Affects: sdk/src/fuzzforge_sdk/client.py lines 397, 484

**High Priority Fixes:**
- Fix IndexError in OSS-Fuzz stats parsing when accessing array elements
  - Add bounds checking before accessing parts[i+1]
  - Affects: workers/ossfuzz/activities.py lines 372-376

- Fix IndexError in exception handling URL parsing
  - Add empty string validation before splitting URL segments
  - Prevents crash when parsing malformed URLs
  - Affects: sdk/src/fuzzforge_sdk/exceptions.py lines 419-426

**Medium Priority Fixes:**
- Fix IndexError in Android workflow SARIF report parsing
  - Check if runs list is empty before accessing first element
  - Affects: backend/toolbox/workflows/android_static_analysis/workflow.py line 270

All fixes follow defensive programming practices with proper bounds checking
and resource management using context managers.
Fixed failing unit tests that were using the old create_finding() signature.
The native findings format refactoring added two new required parameters:
- rule_id: Identifier for the rule/pattern that detected the finding
- found_by: FoundBy object with module, tool, and detection type info

Updated tests:
- test_cargo_fuzzer.py::test_create_finding_from_crash
- test_atheris_fuzzer.py::test_create_crash_finding

Both tests now properly instantiate FoundBy objects with appropriate
fuzzer metadata (module name, tool name, version, and type="fuzzer").
Fixed Pydantic validation error by importing FoundBy from modules.base
instead of models.finding_schema. The ModuleFinding class in base.py
uses its own FoundBy definition, and Pydantic requires the exact same
class instance for validation to pass.

This resolves the validation errors:
- test_cargo_fuzzer.py::test_create_finding_from_crash
- test_atheris_fuzzer.py::test_create_crash_finding
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants