feat: Migrate to native findings format with enhanced CLI and HTML export #36

tduhamel42 · 2025-11-14T09:53:11Z

Overview

This PR migrates FuzzForge from SARIF-based findings to a native findings format, with significant enhancements to CLI display and HTML export functionality.

Key Changes

🏗️ Core Architecture Changes

Native Findings Format - Introduced new native finding schema replacing SARIF dependency
- New FindingSchema data model with required/optional fields
- Simplified finding structure optimized for FuzzForge workflows
- Maintains compatibility with SARIF export format
Database Refactoring - Updated SQLite schema for native format
- New finding storage structure
- Improved query performance
- Better support for finding history tracking
Reporter Module Rewrite - Converted sarif_reporter.py → native_reporter.py
- Native format as primary representation
- SARIF export as output format option
- All 10 analysis modules updated to use new create_finding signature

🎨 CLI Enhancements

Enhanced Findings Display
- Rich table formatting with improved readability
- Color-coded severity levels
- Cleaner table columns (removed Rule and Confidence columns)
- Better finding detail views
Modernized HTML Export
- Interactive charts and filters using Chart.js
- FuzzForge styling and color scheme
- Responsive design with improved UX
- Better finding organization and navigation
- Enhanced finding page with complete metadata display

🐛 Critical Bug Fixes

File Handle Leaks (sdk/client.py:397, 484)
- Fixed resource leaks in sync/async upload methods
- Using context managers for proper cleanup
IndexError Fixes
- OSS-Fuzz stats parsing (workers/ossfuzz/activities.py:372-376)
- URL parsing in exception handling (sdk/exceptions.py:419-426)
- Android workflow SARIF parsing (android_static_analysis/workflow.py:270)
CLI Command Issues
- Fixed OptionInfo parameter handling in findings commands
- Corrected command suggestions throughout CLI
- Fixed --fail-on help text descriptions

Affected Components

Backend: Finding models, all analyzer modules, reporter module
CLI: Findings commands, database layer, HTML export
SDK: Client upload methods, exception handling
Workers: Android workflow, OSS-Fuzz activities
Workflows: All 10 workflows updated for new finding format

Testing

✅ All bug fixes tested and verified
✅ CLI commands working with new format
✅ HTML export renders correctly with new styling
✅ Finding creation and retrieval working across all modules

Breaking Changes

⚠️ Database schema change - Requires database migration or fresh .fuzzforge/ directory
⚠️ API changes - All create_finding() calls now use new signature

Migration Notes

Existing users should:

Backup existing .fuzzforge/findings.db if needed
Run ff init --force to reinitialize with new schema
Re-run workflows to populate with native format findings

Related Issues

Addresses multiple issues identified in comprehensive code audit including critical resource leaks and error handling bugs.

Note: This PR supersedes the original feature/native-findings-format branch which had unrelated git history after commit rewriting. All changes have been cleanly cherry-picked onto current dev.

Priority 1 implementation: - Created native FuzzForge findings format schema with full support for: - 5-level severity (critical/high/medium/low/info) - Confidence levels - CWE and OWASP categorization - found_by attribution (module, tool, type) - LLM context tracking (model, prompt, temperature) - Updated ModuleFinding model with new fields: - Added rule_id for pattern identification - Added found_by for detection attribution - Added llm_context for LLM-detected findings - Added confidence, cwe, owasp, references - Added column_start/end for precise location - Updated create_finding() helper with new required fields - Enhanced _generate_summary() with confidence and source tracking - Fixed critical ID bug in CLI: - Changed 'ff finding show' to use --id (unique) instead of --rule - Added new show_findings_by_rule() function to show ALL findings matching a rule - Updated display_finding_detail() to support both native and SARIF formats - Now properly handles multiple findings with same rule_id Breaking changes: - create_finding() now requires rule_id and found_by parameters - show_finding() now uses --id instead of --rule flag

- Renamed FindingRecord.sarif_data to findings_data - Updated database schema: sarif_data column -> findings_data column - Updated all database methods to work with native format: - save_findings() - get_findings() - list_findings() - get_all_findings() - get_aggregated_stats() - Updated SQL queries to use native format JSON paths: - Changed from SARIF paths ($.runs[0].results) to native paths ($.findings) - Updated severity filtering from SARIF levels (error/warning/note) to native (critical/high/medium/low/info) - Updated CLI commands to support both formats during transition: - get_findings command now extracts summary from both native and SARIF formats - show_finding and show_findings_by_rule updated to use findings_data field - Format detection to handle data from API (still SARIF) and database (native) Breaking changes: - Database schema changed - existing databases will need recreation - FindingRecord.sarif_data renamed to findings_data

- Renamed sarif_reporter.py to native_reporter.py to reflect new functionality - Updated WorkflowFindings model to use native format - Field name 'sarif' kept for API compatibility but now contains native format - Updated docstring to reflect native format usage - Converted SARIFReporter to Native Reporter: - Module name changed from sarif_reporter to native_reporter (v2.0.0) - Updated metadata and input/output schemas - Removed SARIF-specific config (tool_name, include_code_flows) - Added native format config (workflow_name, run_id) - Implemented native report generation: - Added _generate_native_report() method - Generates native FuzzForge format with full field support: - Unique finding IDs - found_by attribution (module, tool, type) - LLM context when applicable - Full severity scale (critical/high/medium/low/info) - Confidence levels - CWE and OWASP mappings - Enhanced location info (columns, snippets) - References and metadata - Added _create_native_summary() for aggregated stats - Summary includes counts by severity, confidence, category, source, and type - Tracks affected files count - Kept old SARIF generation methods for reference - Will be moved to separate SARIF exporter module Breaking changes: - Reporter now outputs native format instead of SARIF - Existing workflows using sarif_reporter will need updates - Config parameters changed (tool_name -> workflow_name, etc.)

Priority 3 implementation: **Improved Table Display:** - Removed hardcoded 50-result limit - Added pagination with --limit and --offset parameters - New table columns: Finding ID (8 chars), Confidence (H/M/L), Found By - Supports both native and SARIF formats with auto-detection - Proper severity ordering (critical > high > medium > low > info) - Pagination footer showing "Showing X-Y of Z results" **Syntax Highlighting:** - Added syntax-highlighted code snippets using Rich's Syntax - Auto-detects language from file extension (20+ languages supported) - Line numbers with correct start line from finding location - Monokai theme for better readability **Enhanced Detail View:** - Confidence indicators with emoji (🟢 High, 🟡 Medium, 🔴 Low) - Type-specific badges (🤖 LLM, 🔧 Tool, 🎯 Fuzzer, 👤 Manual) - LLM context display with model name and prompt preview - Better formatted found_by info with module and type - Added suggestion to view all findings with same rule - Cleaner recommendation display with 💡 icon **New Commands:** - Added 'ff findings by-rule' command to show all findings matching a rule - Registered as @app.command("by-rule") **Updated Related Commands:** - all_findings: Updated to use 5-level severity (critical/high/medium/low/info) - Table columns changed from Error/Warning/Note to Critical/High/Medium/Low - Summary panel updated with proper severity mapping - Support for both native and SARIF format findings_data **Breaking Changes:** - Severity display changed from 3-level (error/warning/note) to 5-level - Table structure modified with new columns - Old SARIF-only views deprecated in favor of format-agnostic displays

Fixed broken import after renaming sarif_reporter.py to native_reporter.py

Updated 10 modules to use the new create_finding() signature with required rule_id and found_by parameters: - llm_analyzer.py: Added FoundBy and LLMContext for AI-detected findings - bandit_analyzer.py: Added tool attribution and moved CWE/confidence to proper fields - security_analyzer.py: Updated all three finding types (secrets, SQL injection, dangerous functions) - mypy_analyzer.py: Added tool attribution and moved column info to column_start - mobsf_scanner.py: Updated all 6 finding types (permissions, manifest, code analysis, behavior) with proper line number handling - opengrep_android.py: Added tool attribution, proper CWE/OWASP formatting, and confidence mapping - dependency_scanner.py: Added pip-audit attribution for CVE findings - file_scanner.py: Updated both sensitive file and enumeration findings - cargo_fuzzer.py: Added fuzzer type attribution for crash findings - atheris_fuzzer.py: Added fuzzer type attribution for Python crash findings All modules now properly track: - Finding source (module, tool name, version, type) - Confidence levels (high/medium/low) - CWE and OWASP mappings where applicable - LLM context for AI-detected issues

Aligns main.py with the updated findings.py command that changed from --rule to --id for finding lookups by unique UUID.

Removes the Confidence column from the findings table display to eliminate confusion with the Severity column (both used High/Medium/Low terminology). Changes: - Removed 'Conf' column from table structure - Removed confidence extraction logic for both native and SARIF formats - Removed confidence badge creation and styling - Table now shows: ID | Severity | Rule | Message | Found By | Location Confidence data is still available in detailed finding view (ff finding show).

Removes the Rule column from findings table to simplify the view and reduce redundancy with the Message column. Rule ID is still available in: - Detailed finding view (ff finding show <run-id> --id <finding-id>) - By-rule grouping command (ff findings by-rule <run-id> --rule <rule-id>) Changes: - Removed 'Rule' column from table structure - Removed rule_text extraction and styling logic - Expanded Message column from 35 to 50 chars (more space available) - Expanded Location column from 18 to 20 chars - Table now shows: ID | Severity | Message | Found By | Location Benefits: - Cleaner, more scannable table - Message column has more room to show details - Less visual clutter while maintaining all functionality

- Rewrite export_to_html with Bootstrap 5 styling - Add Chart.js visualizations (severity, type, category, source) - Add executive summary dashboard with stat cards - Add interactive filtering by severity, type, and search - Add sortable table columns - Add expandable row details with full finding information - Add Prism.js syntax highlighting for code snippets - Display LLM context, confidence, CWE/OWASP, recommendations - Make responsive with print-friendly CSS - Update extract_simplified_findings to handle native format - Update export_to_csv to handle native format with more fields - Fix export functions to use findings_data instead of sarif_data - Add safe_escape helper to handle None values

- Fix OptionInfo bug causing 'ff finding <run_id>' to crash - Add explicit limit=None, offset=0 parameters in main.py calls - Prevents OptionInfo objects from being used in arithmetic operations - Fix command suggestions after workflow completion - Change 'fuzzforge findings' to 'ff finding' (correct syntax) - Add missing 'View findings' suggestion after submission - Fix --fail-on help text - Change from 'severity' to 'SARIF level' (error,warning,note,info) - Matches actual implementation - Update CLI documentation - Fix 'ff finding show' parameter from --rule to --id - Mark unimplemented AI commands as 'Coming Soon' - Correct 'ff ingest' documentation to match actual implementation - Remove fake subcommands, document actual options

Fixed multiple critical bugs identified during comprehensive code audit: **Critical Fixes:** - Fix file handle leaks in SDK client upload methods (sync and async) - Use context managers to ensure file handles are properly closed - Affects: sdk/src/fuzzforge_sdk/client.py lines 397, 484 **High Priority Fixes:** - Fix IndexError in OSS-Fuzz stats parsing when accessing array elements - Add bounds checking before accessing parts[i+1] - Affects: workers/ossfuzz/activities.py lines 372-376 - Fix IndexError in exception handling URL parsing - Add empty string validation before splitting URL segments - Prevents crash when parsing malformed URLs - Affects: sdk/src/fuzzforge_sdk/exceptions.py lines 419-426 **Medium Priority Fixes:** - Fix IndexError in Android workflow SARIF report parsing - Check if runs list is empty before accessing first element - Affects: backend/toolbox/workflows/android_static_analysis/workflow.py line 270 All fixes follow defensive programming practices with proper bounds checking and resource management using context managers.

Fixed failing unit tests that were using the old create_finding() signature. The native findings format refactoring added two new required parameters: - rule_id: Identifier for the rule/pattern that detected the finding - found_by: FoundBy object with module, tool, and detection type info Updated tests: - test_cargo_fuzzer.py::test_create_finding_from_crash - test_atheris_fuzzer.py::test_create_crash_finding Both tests now properly instantiate FoundBy objects with appropriate fuzzer metadata (module name, tool name, version, and type="fuzzer").

Fixed Pydantic validation error by importing FoundBy from modules.base instead of models.finding_schema. The ModuleFinding class in base.py uses its own FoundBy definition, and Pydantic requires the exact same class instance for validation to pass. This resolves the validation errors: - test_cargo_fuzzer.py::test_create_finding_from_crash - test_atheris_fuzzer.py::test_create_crash_finding

tduhamel42 and others added 16 commits November 14, 2025 10:51

fix: Update reporter __init__.py to import from native_reporter

99cee28

Fixed broken import after renaming sarif_reporter.py to native_reporter.py

fix: Update main.py to use finding_id instead of rule_id parameter

ef8384c

Aligns main.py with the updated findings.py command that changed from --rule to --id for finding lookups by unique UUID.

feat: make finding reports having the fuzzforge styling and colors

d29fc7e

feat: finding page improvements

f771c77

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Migrate to native findings format with enhanced CLI and HTML export #36

feat: Migrate to native findings format with enhanced CLI and HTML export #36

Uh oh!

tduhamel42 commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Migrate to native findings format with enhanced CLI and HTML export #36

Are you sure you want to change the base?

feat: Migrate to native findings format with enhanced CLI and HTML export #36

Uh oh!

Conversation

tduhamel42 commented Nov 14, 2025

Overview

Key Changes

🏗️ Core Architecture Changes

🎨 CLI Enhancements

🐛 Critical Bug Fixes

Affected Components

Testing

Breaking Changes

Migration Notes

Related Issues

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants