Skip to content

πŸ”¬ Forensic Audio Analysis System - Multi-phase voice manipulation detection pipeline with cryptographic verification

License

Notifications You must be signed in to change notification settings

SWORDIntel/AUDIOANALYSISX1

Repository files navigation

Voice Manipulation Detection Pipeline

╔══════════════════════════════════════════════════════════════════════════════╗
β•‘                                                                              β•‘
β•‘   β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—    β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—        β•‘
β•‘   β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β•β•    β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—       β•‘
β•‘   β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—      β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘       β•‘
β•‘   β•šβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β•β•β•      β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘       β•‘
β•‘    β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—    β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•       β•‘
β•‘     β•šβ•β•β•β•   β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•β•β•β•β•β•šβ•β•β•β•β•β•β•    β•šβ•β•     β•šβ•β•β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β•        β•‘
β•‘                                                                              β•‘
β•‘              FORENSIC AUDIO MANIPULATION DETECTION SYSTEM                    β•‘
β•‘                  Tactical Implementation Specification                       β•‘
β•‘                                                                              β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

A comprehensive system for voice manipulation detection AND real-time voice modification with forensic-grade analysis.

Python 3.10+ License: MIT Code style: black

Features β€’ Voice Modification β€’ Installation β€’ Quick Start β€’ Documentation β€’ Examples


🎯 Overview

This system provides comprehensive voice manipulation capabilities with both detection and modification features:

πŸ” Detection: 5-Phase Forensic Analysis

Detects voice manipulation and AI-generated voices, including:

  • Pitch-shifting (male ↔ female voice conversion)
  • Time-stretching (speed manipulation)
  • Phase vocoder artifacts (deepfake/alteration signatures)
  • Combined manipulations (multi-vector attacks)
  • AI-generated voices (TTS, voice cloning, deepfakes)
  • Neural vocoder detection (WaveNet, WaveGlow, HiFi-GAN)

Uses multiple independent detection methods to provide high-confidence results with cryptographically verifiable outputs.

🎭 Modification: Real-Time Voice Transformation (NEW)

Transform voices in real-time for legitimate purposes:

  • Gender transformation (male ↔ female voice conversion)
  • Character voices (robot, alien, demon, chipmunk, giant)
  • Anonymization (privacy protection presets)
  • Utility effects (whisper, megaphone, telephone, cave)
  • Custom parameters (pitch, formant, time stretch, reverb, echo)
  • Live audio I/O with low latency (~43ms at 48kHz)

Perfect for privacy protection, content creation, research, and testing detection systems.

πŸ”¬ How It Works

Voice manipulators typically alter pitch (F0) to change perceived gender, but they cannot easily change formants (physical vocal tract resonances). This creates a detectable pitch-formant incoherence that serves as forensic evidence.

CLEAN AUDIO:     F0 = 120 Hz (Male) βœ“ + Formants = Male βœ“ β†’ COHERENT
MANIPULATED:     F0 = 220 Hz (Female) βœ— + Formants = Male βœ“ β†’ INCOHERENT ⚠

✨ Features

πŸ” Multi-Phase Detection

  • PHASE 1: Baseline F0 Analysis - Isolates presented pitch
  • PHASE 2: Vocal Tract Analysis - Extracts physical formant characteristics
  • PHASE 3: Manipulation Artifact Detection - Three independent methods:
    • 🎡 Pitch-Formant Incoherence Detection
    • πŸ“Š Mel Spectrogram Artifact Analysis
    • ⚑ Phase Decoherence / Transient Smearing Detection
  • PHASE 4: AI Voice Detection - Advanced detection using a pre-trained Wav2Vec2 model.
  • PHASE 5: Report Synthesis - Generates verified, tamper-evident reports

🌐 Web GUI (NEW)

Modern web-based interface with:

  • πŸ–±οΈ Drag-and-drop file upload
  • πŸ“Š Real-time visualization updates
  • πŸ“₯ One-click report downloads (JSON, Markdown, CSV)
  • πŸ“ Batch processing with progress bars
  • 🎨 Dark theme with responsive layout
  • 🌍 Shareable links for demos

πŸ–₯️ Interactive TUI

Beautiful terminal interface with:

  • Real-time progress tracking
  • Color-coded results
  • Interactive menus
  • Batch processing support

πŸ”’ Verifiable Outputs

Every analysis includes:

  • SHA-256 checksums of audio files
  • Cryptographic signatures for tamper detection
  • Chain of custody metadata
  • Multiple output formats (JSON, Markdown, visualizations)

πŸ“Š Comprehensive Visualizations

Generates 4 plots per analysis:

  • Overview dashboard
  • Mel spectrogram with artifact annotations
  • Phase coherence analysis
  • Pitch-formant comparison chart

🎭 Voice Modification System (NEW)

In addition to detection capabilities, this system now includes real-time voice modification/obfuscation for legitimate purposes such as privacy protection, content creation, and testing detection systems.

πŸ”Š Real-Time Voice Transformation

The voice modification system provides low-latency, real-time audio processing with:

  • Live Audio I/O - Real-time microphone input and speaker output
  • Multiple Effect Types - Pitch shifting, formant shifting, time stretching, reverb, echo
  • Preset Library - 15+ pre-configured voice transformations
  • Custom Controls - Fine-tune all parameters in real-time
  • Low Latency - ~43ms processing latency at 48kHz
  • Professional Quality - Broadcast-ready audio processing

🎨 Available Presets

Gender Transformation

  • male_to_female - Transform male voice to female voice
  • female_to_male - Transform female voice to male voice
  • male_to_female_subtle - Subtle male to female transformation
  • female_to_male_subtle - Subtle female to male transformation

Character Voices

  • chipmunk - High-pitched cartoon voice
  • giant - Deep, slow voice
  • robot - Robotic/synthetic voice
  • demon - Deep, reverberant voice
  • alien - Otherworldly voice

Utility Effects

  • whisper - Quiet, breathy voice
  • megaphone - Loud, compressed voice
  • telephone - Phone line quality
  • cave - Large reverberant space

Anonymization

  • anonymous_1 - Voice anonymization (subtle)
  • anonymous_2 - Voice anonymization (moderate)
  • anonymous_3 - Voice anonymization (heavy)

πŸš€ Using Voice Modification

Option 1: Web GUI (Recommended)

python run_voice_modifier_gui.py

# Custom port
python run_voice_modifier_gui.py --port 7861

# Public share link
python run_voice_modifier_gui.py --share

Opens a web interface at http://localhost:7861 with:

  • 🎚️ Real-time controls for all parameters
  • 🎭 Preset selector with all voice transformations
  • πŸ“Š Live level meters for input/output monitoring
  • πŸ”Š Device selection for audio input/output
  • ⚑ Instant preview of voice modifications

Option 2: Command Line

# List available audio devices
python run_voice_modifier.py --list-devices

# List available presets
python run_voice_modifier.py --list-presets

# Use a preset
python run_voice_modifier.py --preset male_to_female

# Custom settings
python run_voice_modifier.py --pitch 6 --formant 1.15

# Specify devices
python run_voice_modifier.py --preset robot --input-device 1 --output-device 2

# Start in bypass mode (no processing)
python run_voice_modifier.py --bypass

Option 3: Python API

from audioanalysisx1.voicemod import VoiceModifier, AudioProcessor, PRESET_LIBRARY

# Create modifier with configuration
config = AudioConfig(sample_rate=48000, block_size=2048)
modifier = VoiceModifier(config)

# Create processor and apply preset
processor = AudioProcessor()
processor.apply_preset_by_name('male_to_female')

# Add processor and start
modifier.add_effect(processor)
modifier.start()

# Modify settings in real-time
processor.set_pitch(8.0)  # 8 semitones up
processor.set_formant(1.2)  # 20% higher formants

# Stop when done
modifier.stop()

πŸŽ›οΈ Effect Parameters

Parameter Range Description
Pitch -12 to +12 semitones Shift fundamental frequency
Formant 0.5 to 2.0 ratio Shift vocal tract resonances
Time Stretch 0.5 to 2.0x Change speaking speed
Reverb 0.0 to 1.0 wet mix Add room reverberation
Echo 0.0 to 1.0 wet mix Add delayed repetitions
Noise Gate On/Off Remove background noise
Compression On/Off Normalize volume levels

πŸ”’ Ethical Use Notice

The voice modification system is designed for legitimate purposes only:

βœ… Intended Uses

  • Privacy protection and anonymization
  • Entertainment and gaming
  • Content creation and podcasting
  • Research and development
  • Testing detection systems (like this one!)
  • Accessibility features

❌ Prohibited Uses

  • Impersonation without consent
  • Fraud or deception
  • Harassment or abuse
  • Illegal activities
  • Violation of platform terms of service

By using this software, you agree to use it responsibly and in accordance with all applicable laws and regulations.

πŸ“Š Technical Specifications

  • Sample Rates: 44.1kHz, 48kHz (configurable)
  • Block Size: 1024-4096 samples (configurable)
  • Latency: ~43ms at 48kHz with 2048 block size
  • Bit Depth: 32-bit float processing
  • Supported Devices: All ASIO, CoreAudio, and ALSA compatible devices

πŸ”¬ Integration with Detection

The voice modification system is intentionally designed to be detectable by this analysis pipeline:

# Create modified audio
modifier = VoiceModifier()
processor = AudioProcessor()
processor.apply_preset_by_name('male_to_female')
# ... record modified audio ...

# Analyze it
detector = VoiceManipulationDetector()
report = detector.analyze('modified_audio.wav')

# Should detect manipulation
assert report['alteration_detected'] == True

This makes the system ideal for:

  • Testing detection algorithms
  • Training forensic analysts
  • Demonstrating manipulation artifacts
  • Security research and education

πŸ“¦ Installation

Prerequisites

  • Python 3.10 or higher
  • pip package manager
  • 4GB RAM minimum

Quick Install

# Clone or navigate to the project directory
cd /home/john/voice

# Install dependencies
pip install -r requirements.txt

Dependencies

librosa>=0.10.0          # Audio analysis
numpy>=1.24.0            # Numerical computing
scipy>=1.10.0            # Signal processing
matplotlib>=3.7.0        # Visualizations
praat-parselmouth>=0.4.3 # Formant extraction
soundfile>=0.12.0        # Audio I/O
rich>=13.0.0             # Terminal UI
click>=8.1.0             # CLI framework

πŸš€ Quick Start

Option 1: Web GUI (Recommended - Most User-Friendly) πŸ†•

python scripts/start-gui
# or
python run_gui.py

Opens a beautiful web interface at http://localhost:7860 with:

  • πŸ–±οΈ Drag-and-drop file upload
  • πŸ“Š Real-time visualizations
  • πŸ“₯ Download JSON/Markdown reports
  • πŸ“ Batch processing with progress tracking
  • 🎨 Modern UI with dark theme

Perfect for: Visual analysis, presentations, non-technical users

Option 2: Simple Command Line

# Analyze a single file
python scripts/analyze suspicious_call.wav

# Batch process a directory
python scripts/analyze --batch ./audio_samples/ -o ./results/

# Faster (no visualizations)
python scripts/analyze sample.wav --no-viz

Perfect for: Quick analysis, scripting, automation

Option 3: Interactive TUI

python -m audioanalysisx1.cli.interactive

Terminal-based menu interface with full features.

Perfect for: Server environments, SSH sessions

Option 4: Python API

from audioanalysisx1.pipeline import VoiceManipulationDetector

detector = VoiceManipulationDetector()
report = detector.analyze('sample.wav', output_dir='results/')

# Check results
if report['alteration_detected']:
    print(f"⚠ MANIPULATION DETECTED")
    confidence = report['confidence']
    print(f"Confidence: {confidence['score']:.0%} ({confidence['label']})")
else:
    print(f"βœ“ No manipulation detected")

Perfect for: Integration, custom workflows, automation


πŸ“– Documentation

Core Documentation

All documentation is in the docs/ directory:

Understanding Reports

Each analysis generates a comprehensive report:

{
  "asset_id": "sample_001",
  "alteration_detected": true,
  "confidence": {
    "score": 0.99,
    "label": "Very High"
  },
  "presented_sex": "Female",
  "probable_sex": "Male",
  "f0_baseline": "221.5 Hz",
  "evidence": {
    "pitch": "Pitch-Formant Incoherence Detected...",
    "time": "Phase Decoherence / Transient Smearing Detected...",
    "spectral": "Spectral Artifacts Detected...",
    "ai": "No AI voice artifacts detected"
  },
  "verification": {
    "file_hash_sha256": "7bd4d4ce92be3174...",
    "report_hash_sha256": "6e5edefb6fd84dc9...",
    "timestamp_utc": "2025-10-29T23:05:08Z"
  }
}

Confidence Levels

Level Score Range Description
Very High >= 90% Multiple independent confirmations
High 75% - 89% Strong evidence from multiple vectors
Medium 50% - 74% Moderate evidence from a single vector
Low < 50% No significant manipulation detected

🌐 Web GUI Features

Launch the GUI

python start_gui.py

# Custom port
python start_gui.py --port=8080

# Create shareable public link
python start_gui.py --share

GUI Interface

The web GUI provides 4 ways to interact with the system:

1. Single File Analysis Tab

  • Drag-and-drop audio file upload
  • Real-time progress updates (Phase 1-5)
  • Instant results display with HTML formatting
  • Visual gallery showing all 4 analysis plots
  • Download buttons for JSON and Markdown reports

2. Batch Processing Tab

  • Multi-file upload support
  • Progress tracking for each file
  • Summary statistics table
  • CSV export for batch results

3. About & Help Tab

  • Detection methods explanation
  • Confidence levels guide
  • Interpretation tips
  • Security information

GUI Screenshots

Access at: http://localhost:7860

Features:

  • 🎨 Dark theme interface
  • πŸ“± Responsive design
  • ⚑ Real-time updates
  • πŸ”’ Secure (local processing)

πŸ’‘ Examples

Example 1: Web GUI (Easiest)

python start_gui.py

Then:

  1. Open browser to http://localhost:7860
  2. Drag audio file onto upload area
  3. Click "Analyze Audio"
  4. View results and visualizations
  5. Download reports

Example 2: Basic CLI Analysis

python tui.py analyze suspicious_voice.wav

Output:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
FORENSIC AUDIO ANALYSIS REPORT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ASSET_ID: suspicious_voice
ALTERATION DETECTED: True
CONFIDENCE: 99% (Very High)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
EVIDENCE VECTORS:
  [1] PITCH: Pitch-Formant Incoherence Detected
  [2] TIME: Phase Decoherence / Transient Smearing Detected
  [3] SPECTRAL: Spectral Artifacts Detected
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Example 2: Batch Processing

from pipeline import VoiceManipulationDetector

detector = VoiceManipulationDetector()
reports = detector.batch_analyze(
    audio_dir='./evidence/',
    output_dir='./case_001_results/',
    pattern='*.wav'
)

# Generate summary
manipulated = sum(1 for r in reports if r['ALTERATION_DETECTED'])
print(f"Detected manipulation in {manipulated}/{len(reports)} files")

Example 3: Verification

from verification import OutputVerifier

verifier = OutputVerifier()
result = verifier.verify_report('results/sample_report.json')

if result['valid']:
    print(f"βœ“ Report verified - Timestamp: {result['timestamp']}")
else:
    print(f"βœ— Verification failed: {result['error']}")

Example 4: Export to CSV

from verification import ReportExporter

exporter = ReportExporter()
exporter.export_csv_summary(reports, 'case_summary.csv')

πŸ§ͺ Testing

Run the comprehensive test suite:

python test_pipeline.py

This will:

  1. Generate 6 synthetic test samples (3 clean + 3 manipulated)
  2. Analyze each sample through the full pipeline
  3. Verify detection accuracy
  4. Test cryptographic verification system
  5. Generate complete reports and visualizations

Expected Output:

================================================================================
TEST RESULTS SUMMARY
================================================================================
Total Tests: 7
Passed: 7 (βœ“)
Failed: 0 (βœ“)
Success Rate: 100.0%

πŸ“ Project Structure

AUDIOANALYSISX1/
β”œβ”€β”€ README.md                    # This file
β”œβ”€β”€ setup.py                     # Package installation
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ .gitignore                   # Git ignore rules
β”‚
β”œβ”€β”€ docs/                        # πŸ“š Documentation
β”‚   β”œβ”€β”€ getting-started.md       # Quick start guide
β”‚   β”œβ”€β”€ gui-guide.md            # Web GUI guide
β”‚   β”œβ”€β”€ usage.md                # Usage guide
β”‚   β”œβ”€β”€ technical.md            # Technical details
β”‚   β”œβ”€β”€ api-reference.md        # API documentation
β”‚   β”œβ”€β”€ deployment.md           # Deployment guide
β”‚   └── debug-report.md         # Debug validation
β”‚
β”œβ”€β”€ audioanalysisx1/            # πŸ”¬ Main Package
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ pipeline.py             # Main orchestrator
β”‚   β”œβ”€β”€ verification.py         # Cryptographic verification
β”‚   β”œβ”€β”€ visualizer.py           # Visualization engine
β”‚   β”‚
β”‚   β”œβ”€β”€ phases/                 # Detection phases
β”‚   β”‚   β”œβ”€β”€ baseline.py         # PHASE 1: F0 Analysis
β”‚   β”‚   β”œβ”€β”€ formants.py         # PHASE 2: Formant Analysis
β”‚   β”‚   β”œβ”€β”€ artifacts.py        # PHASE 3: Manipulation Detection
β”‚   β”‚   β”œβ”€β”€ ai_detection.py     # PHASE 4: AI Detection
β”‚   β”‚   └── reporting.py        # PHASE 5: Report Synthesis
β”‚   β”‚
β”‚   β”œβ”€β”€ voicemod/               # 🎭 Voice Modification System (NEW)
β”‚   β”‚   β”œβ”€β”€ __init__.py         # Module interface
β”‚   β”‚   β”œβ”€β”€ realtime.py         # Real-time audio I/O
β”‚   β”‚   β”œβ”€β”€ processor.py        # Audio processor
β”‚   β”‚   β”œβ”€β”€ effects.py          # Effect implementations
β”‚   β”‚   β”œβ”€β”€ presets.py          # Voice presets library
β”‚   β”‚   └── gui.py              # Web GUI for modification
β”‚   β”‚
β”‚   β”œβ”€β”€ gui/                    # Web GUI (Detection)
β”‚   β”‚   β”œβ”€β”€ app.py             # Gradio interface
β”‚   β”‚   └── utils.py           # GUI utilities
β”‚   β”‚
β”‚   └── cli/                    # CLI interfaces
β”‚       β”œβ”€β”€ simple.py           # Simple CLI
β”‚       └── interactive.py      # Interactive TUI
β”‚
β”œβ”€β”€ scripts/                    # πŸš€ Executable Scripts
β”‚   β”œβ”€β”€ start-gui               # Launch detection GUI
β”‚   β”œβ”€β”€ analyze                 # Simple analysis
β”‚   └── download-samples        # Sample generator
β”‚
β”œβ”€β”€ run_voice_modifier.py       # 🎭 Voice modifier CLI
β”œβ”€β”€ run_voice_modifier_gui.py   # 🎭 Voice modifier GUI
β”‚
β”œβ”€β”€ deepfake_model/             # πŸ€– Pre-trained AI model
β”‚
β”œβ”€β”€ tests/                      # πŸ§ͺ Test Suite
β”‚   β”œβ”€β”€ test_pipeline.py        # Pipeline tests
β”‚   β”œβ”€β”€ validate_system.py      # System validation
β”‚   └── examples.py             # Usage examples
β”‚
└── samples/                    # 🎡 Sample Audio
    β”œβ”€β”€ README.md
    β”œβ”€β”€ human/                  # Clean recordings
    β”œβ”€β”€ tts/                    # AI-generated
    └── manipulated/            # Pitch/time-shifted

πŸ”§ Technical Details

Detection Methods

  1. Pitch-Formant Incoherence

    • Compares F0 (fundamental frequency) vs formants (F1, F2, F3)
    • Detects physical impossibilities in voice characteristics
    • Primary method for pitch-shift detection
  2. Mel Spectrogram Analysis

    • Identifies unnatural harmonic structures
    • Detects consistent computational noise floor
    • Finds spectral discontinuities
  3. Phase Coherence Analysis

    • Analyzes STFT phase information
    • Detects transient smearing from time-stretching
    • Identifies phase discontinuities

Algorithms Used

  • F0 Extraction: librosa.piptrack with adaptive thresholding
  • Formant Analysis: Praat Burg algorithm via parselmouth
  • Phase Analysis: STFT with phase unwrapping
  • Artifact Detection: Statistical analysis of spectral features

Supported Formats

  • WAV, MP3, FLAC, OGG, M4A (via librosa)
  • Sample rates: Any (automatically resampled)
  • Duration: Up to 10 minutes recommended

πŸ›‘οΈ Security & Privacy

Security Features

  • βœ… Sandboxed execution (no network access required)
  • βœ… Read-only file operations (no modification of source audio)
  • βœ… Cryptographic verification (SHA-256 checksums)
  • βœ… Tamper-evident reports (signed outputs)
  • βœ… Resource limiting (DoS protection)

Privacy Considerations

  • No audio data is sent to external servers
  • All processing is local and offline
  • No personally identifiable information is stored
  • Original audio files are never modified

Chain of Custody

Each report includes:

  • Timestamp (UTC)
  • Audio file hash
  • Analysis pipeline version
  • Cryptographic signature

🀝 Use Cases

Detection System - Authorized Applications

  • βœ… Forensic investigations (law enforcement, legal proceedings)
  • βœ… Security testing (authorized penetration testing)
  • βœ… Academic research (voice processing studies)
  • βœ… Quality assurance (detecting processing artifacts)
  • βœ… CTF challenges (cybersecurity competitions)

Voice Modification - Authorized Applications

  • βœ… Privacy protection (whistleblowers, journalists, activists)
  • βœ… Content creation (podcasts, videos, gaming, streaming)
  • βœ… Entertainment (voice acting, character voices)
  • βœ… Research and education (testing detection systems)
  • βœ… Accessibility (voice assistance for medical conditions)
  • βœ… Training (security analyst training, forensic education)

Prohibited Applications (Both Systems)

  • ❌ Unauthorized surveillance or monitoring
  • ❌ Impersonation without consent
  • ❌ Fraud, deception, or illegal activities
  • ❌ Harassment or stalking
  • ❌ Discrimination based on voice characteristics
  • ❌ Violation of platform terms of service

πŸ“Š Performance

Benchmarks

Audio Duration Analysis Time Memory Usage
3 seconds ~3-5 seconds ~200 MB
30 seconds ~8-12 seconds ~400 MB
3 minutes ~25-35 seconds ~800 MB

Tested on: Intel i7-9750H, 16GB RAM

Optimization Tips

  • Disable visualizations with --no-viz for faster processing
  • Use batch mode for multiple files (shared initialization)
  • Limit audio duration for large files: librosa.load(..., duration=30.0)

πŸ› Troubleshooting

Common Issues

Issue: "No module named 'librosa'" Solution: Run pip install -r requirements.txt

Issue: Parselmouth errors on certain files Solution: Ensure audio is valid format (WAV, MP3). Try converting with ffmpeg

Issue: False positives on synthetic/generated audio Expected: Synthetic audio has unnatural characteristics that may trigger detection

Issue: Memory errors on large files Solution: Limit duration: y, sr = librosa.load('file.wav', duration=60.0)

Debug Mode

Enable verbose logging:

import logging
logging.basicConfig(level=logging.DEBUG)

detector = VoiceManipulationDetector()
report = detector.analyze('sample.wav')

πŸ—ΊοΈ Roadmap

Current Version: 2.0.0

  • Multi-phase detection pipeline
  • Interactive TUI
  • Cryptographic verification
  • Comprehensive visualizations
  • Batch processing
  • Test suite
  • Real-time voice modification system (NEW in v2.0)
  • 15+ voice transformation presets (NEW in v2.0)
  • Low-latency audio processing (NEW in v2.0)
  • Web GUI for voice modification (NEW in v2.0)

Planned Features

  • Real-time stream analysis (for detection)
  • Machine learning enhancement (optional deepfake detection)
  • REST API server mode
  • Docker containerization
  • GPU acceleration
  • Additional language support
  • Mobile app support

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

This implementation is based on the Tactical Implementation Specification (TIS) for forensic voice analysis.

Technologies Used


πŸ“ž Support

For issues, questions, or contributions:

  1. Check the documentation
  2. Review examples
  3. Run the test suite
  4. Consult technical documentation

βš–οΈ Ethical Use Statement

This tool is designed for authorized security testing, forensic analysis, and research purposes only. Users must:

  • Obtain proper authorization before analyzing voice recordings
  • Comply with applicable laws and regulations
  • Respect privacy and consent requirements
  • Use results responsibly and ethically

Unauthorized use for surveillance, discrimination, or privacy violation is strictly prohibited.


Built with πŸ”¬ for forensic audio analysis

About

πŸ”¬ Forensic Audio Analysis System - Multi-phase voice manipulation detection pipeline with cryptographic verification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages