Voice Manipulation Detection Pipeline

╔══════════════════════════════════════════════════════════════════════════════╗
║                                                                              ║
║   ██╗   ██╗ ██████╗ ██╗ ██████╗███████╗    ███╗   ███╗██████╗ ██████╗        ║
║   ██║   ██║██╔═══██╗██║██╔════╝██╔════╝    ████╗ ████║██╔══██╗██╔══██╗       ║
║   ██║   ██║██║   ██║██║██║     █████╗      ██╔████╔██║██║  ██║██║  ██║       ║
║   ╚██╗ ██╔╝██║   ██║██║██║     ██╔══╝      ██║╚██╔╝██║██║  ██║██║  ██║       ║
║    ╚████╔╝ ╚██████╔╝██║╚██████╗███████╗    ██║ ╚═╝ ██║██████╔╝██████╔╝       ║
║     ╚═══╝   ╚═════╝ ╚═╝ ╚═════╝╚══════╝    ╚═╝     ╚═╝╚═════╝ ╚═════╝        ║
║                                                                              ║
║              FORENSIC AUDIO MANIPULATION DETECTION SYSTEM                    ║
║                  Tactical Implementation Specification                       ║
║                                                                              ║
╚══════════════════════════════════════════════════════════════════════════════╝

A comprehensive system for voice manipulation detection AND real-time voice modification with forensic-grade analysis.

Features • Voice Modification • Installation • Quick Start • Documentation • Examples

🎯 Overview

This system provides comprehensive voice manipulation capabilities with both detection and modification features:

🔍 Detection: 5-Phase Forensic Analysis

Detects voice manipulation and AI-generated voices, including:

Pitch-shifting (male ↔ female voice conversion)
Time-stretching (speed manipulation)
Phase vocoder artifacts (deepfake/alteration signatures)
Combined manipulations (multi-vector attacks)
AI-generated voices (TTS, voice cloning, deepfakes)
Neural vocoder detection (WaveNet, WaveGlow, HiFi-GAN)

Uses multiple independent detection methods to provide high-confidence results with cryptographically verifiable outputs.

🎭 Modification: Real-Time Voice Transformation (NEW)

Transform voices in real-time for legitimate purposes:

Gender transformation (male ↔ female voice conversion)
Character voices (robot, alien, demon, chipmunk, giant)
Anonymization (privacy protection presets)
Utility effects (whisper, megaphone, telephone, cave)
Custom parameters (pitch, formant, time stretch, reverb, echo)
Live audio I/O with low latency (~43ms at 48kHz)

Perfect for privacy protection, content creation, research, and testing detection systems.

🔬 How It Works

Voice manipulators typically alter pitch (F0) to change perceived gender, but they cannot easily change formants (physical vocal tract resonances). This creates a detectable pitch-formant incoherence that serves as forensic evidence.

CLEAN AUDIO:     F0 = 120 Hz (Male) ✓ + Formants = Male ✓ → COHERENT
MANIPULATED:     F0 = 220 Hz (Female) ✗ + Formants = Male ✓ → INCOHERENT ⚠

✨ Features

🔍 Multi-Phase Detection

PHASE 1: Baseline F0 Analysis - Isolates presented pitch
PHASE 2: Vocal Tract Analysis - Extracts physical formant characteristics
PHASE 3: Manipulation Artifact Detection - Three independent methods:
- 🎵 Pitch-Formant Incoherence Detection
- 📊 Mel Spectrogram Artifact Analysis
- ⚡ Phase Decoherence / Transient Smearing Detection
PHASE 4: AI Voice Detection - Advanced detection using a pre-trained Wav2Vec2 model.
PHASE 5: Report Synthesis - Generates verified, tamper-evident reports

🌐 Web GUI (NEW)

Modern web-based interface with:

🖱️ Drag-and-drop file upload
📊 Real-time visualization updates
📥 One-click report downloads (JSON, Markdown, CSV)
📁 Batch processing with progress bars
🎨 Dark theme with responsive layout
🌍 Shareable links for demos

🖥️ Interactive TUI

Beautiful terminal interface with:

Real-time progress tracking
Color-coded results
Interactive menus
Batch processing support

🔒 Verifiable Outputs

Every analysis includes:

SHA-256 checksums of audio files
Cryptographic signatures for tamper detection
Chain of custody metadata
Multiple output formats (JSON, Markdown, visualizations)

📊 Comprehensive Visualizations

Generates 4 plots per analysis:

Overview dashboard
Mel spectrogram with artifact annotations
Phase coherence analysis
Pitch-formant comparison chart

🎭 Voice Modification System (NEW)

In addition to detection capabilities, this system now includes real-time voice modification/obfuscation for legitimate purposes such as privacy protection, content creation, and testing detection systems.

🔊 Real-Time Voice Transformation

The voice modification system provides low-latency, real-time audio processing with:

Live Audio I/O - Real-time microphone input and speaker output
Multiple Effect Types - Pitch shifting, formant shifting, time stretching, reverb, echo
Preset Library - 15+ pre-configured voice transformations
Custom Controls - Fine-tune all parameters in real-time
Low Latency - ~43ms processing latency at 48kHz
Professional Quality - Broadcast-ready audio processing

🎨 Available Presets

Gender Transformation

male_to_female - Transform male voice to female voice
female_to_male - Transform female voice to male voice
male_to_female_subtle - Subtle male to female transformation
female_to_male_subtle - Subtle female to male transformation

Character Voices

chipmunk - High-pitched cartoon voice
giant - Deep, slow voice
robot - Robotic/synthetic voice
demon - Deep, reverberant voice
alien - Otherworldly voice

Utility Effects

whisper - Quiet, breathy voice
megaphone - Loud, compressed voice
telephone - Phone line quality
cave - Large reverberant space

Anonymization

anonymous_1 - Voice anonymization (subtle)
anonymous_2 - Voice anonymization (moderate)
anonymous_3 - Voice anonymization (heavy)

🚀 Using Voice Modification

Option 1: Web GUI (Recommended)

python run_voice_modifier_gui.py

# Custom port
python run_voice_modifier_gui.py --port 7861

# Public share link
python run_voice_modifier_gui.py --share

Opens a web interface at http://localhost:7861 with:

🎚️ Real-time controls for all parameters
🎭 Preset selector with all voice transformations
📊 Live level meters for input/output monitoring
🔊 Device selection for audio input/output
⚡ Instant preview of voice modifications

Option 2: Command Line

# List available audio devices
python run_voice_modifier.py --list-devices

# List available presets
python run_voice_modifier.py --list-presets

# Use a preset
python run_voice_modifier.py --preset male_to_female

# Custom settings
python run_voice_modifier.py --pitch 6 --formant 1.15

# Specify devices
python run_voice_modifier.py --preset robot --input-device 1 --output-device 2

# Start in bypass mode (no processing)
python run_voice_modifier.py --bypass

Option 3: Python API

from audioanalysisx1.voicemod import VoiceModifier, AudioProcessor, PRESET_LIBRARY

# Create modifier with configuration
config = AudioConfig(sample_rate=48000, block_size=2048)
modifier = VoiceModifier(config)

# Create processor and apply preset
processor = AudioProcessor()
processor.apply_preset_by_name('male_to_female')

# Add processor and start
modifier.add_effect(processor)
modifier.start()

# Modify settings in real-time
processor.set_pitch(8.0)  # 8 semitones up
processor.set_formant(1.2)  # 20% higher formants

# Stop when done
modifier.stop()

🎛️ Effect Parameters

Parameter	Range	Description
Pitch	-12 to +12 semitones	Shift fundamental frequency
Formant	0.5 to 2.0 ratio	Shift vocal tract resonances
Time Stretch	0.5 to 2.0x	Change speaking speed
Reverb	0.0 to 1.0 wet mix	Add room reverberation
Echo	0.0 to 1.0 wet mix	Add delayed repetitions
Noise Gate	On/Off	Remove background noise
Compression	On/Off	Normalize volume levels

🔒 Ethical Use Notice

The voice modification system is designed for legitimate purposes only:

✅ Intended Uses

Privacy protection and anonymization
Entertainment and gaming
Content creation and podcasting
Research and development
Testing detection systems (like this one!)
Accessibility features

❌ Prohibited Uses

Impersonation without consent
Fraud or deception
Harassment or abuse
Illegal activities
Violation of platform terms of service

By using this software, you agree to use it responsibly and in accordance with all applicable laws and regulations.

📊 Technical Specifications

Sample Rates: 44.1kHz, 48kHz (configurable)
Block Size: 1024-4096 samples (configurable)
Latency: ~43ms at 48kHz with 2048 block size
Bit Depth: 32-bit float processing
Supported Devices: All ASIO, CoreAudio, and ALSA compatible devices

🔬 Integration with Detection

The voice modification system is intentionally designed to be detectable by this analysis pipeline:

# Create modified audio
modifier = VoiceModifier()
processor = AudioProcessor()
processor.apply_preset_by_name('male_to_female')
# ... record modified audio ...

# Analyze it
detector = VoiceManipulationDetector()
report = detector.analyze('modified_audio.wav')

# Should detect manipulation
assert report['alteration_detected'] == True

This makes the system ideal for:

Testing detection algorithms
Training forensic analysts
Demonstrating manipulation artifacts
Security research and education

📦 Installation

Prerequisites

Python 3.10 or higher
pip package manager
4GB RAM minimum

Quick Install

# Clone or navigate to the project directory
cd /home/john/voice

# Install dependencies
pip install -r requirements.txt

Dependencies

librosa>=0.10.0          # Audio analysis
numpy>=1.24.0            # Numerical computing
scipy>=1.10.0            # Signal processing
matplotlib>=3.7.0        # Visualizations
praat-parselmouth>=0.4.3 # Formant extraction
soundfile>=0.12.0        # Audio I/O
rich>=13.0.0             # Terminal UI
click>=8.1.0             # CLI framework

🚀 Quick Start

Option 1: Web GUI (Recommended - Most User-Friendly) 🆕

python scripts/start-gui
# or
python run_gui.py

Opens a beautiful web interface at http://localhost:7860 with:

🖱️ Drag-and-drop file upload
📊 Real-time visualizations
📥 Download JSON/Markdown reports
📁 Batch processing with progress tracking
🎨 Modern UI with dark theme

Perfect for: Visual analysis, presentations, non-technical users

Option 2: Simple Command Line

# Analyze a single file
python scripts/analyze suspicious_call.wav

# Batch process a directory
python scripts/analyze --batch ./audio_samples/ -o ./results/

# Faster (no visualizations)
python scripts/analyze sample.wav --no-viz

Perfect for: Quick analysis, scripting, automation

Option 3: Interactive TUI

python -m audioanalysisx1.cli.interactive

Terminal-based menu interface with full features.

Perfect for: Server environments, SSH sessions

Option 4: Python API

from audioanalysisx1.pipeline import VoiceManipulationDetector

detector = VoiceManipulationDetector()
report = detector.analyze('sample.wav', output_dir='results/')

# Check results
if report['alteration_detected']:
    print(f"⚠ MANIPULATION DETECTED")
    confidence = report['confidence']
    print(f"Confidence: {confidence['score']:.0%} ({confidence['label']})")
else:
    print(f"✓ No manipulation detected")

Perfect for: Integration, custom workflows, automation

📖 Documentation

Core Documentation

All documentation is in the docs/ directory:

Getting Started - Quick start guide
GUI Guide - Web interface guide
Usage Guide - Comprehensive usage
Technical Docs - Implementation details
API Reference - Complete API docs
Deployment - Production deployment
Debug Report - System validation

Understanding Reports

Each analysis generates a comprehensive report:

{
  "asset_id": "sample_001",
  "alteration_detected": true,
  "confidence": {
    "score": 0.99,
    "label": "Very High"
  },
  "presented_sex": "Female",
  "probable_sex": "Male",
  "f0_baseline": "221.5 Hz",
  "evidence": {
    "pitch": "Pitch-Formant Incoherence Detected...",
    "time": "Phase Decoherence / Transient Smearing Detected...",
    "spectral": "Spectral Artifacts Detected...",
    "ai": "No AI voice artifacts detected"
  },
  "verification": {
    "file_hash_sha256": "7bd4d4ce92be3174...",
    "report_hash_sha256": "6e5edefb6fd84dc9...",
    "timestamp_utc": "2025-10-29T23:05:08Z"
  }
}

Confidence Levels

Level	Score Range	Description
Very High	>= 90%	Multiple independent confirmations
High	75% - 89%	Strong evidence from multiple vectors
Medium	50% - 74%	Moderate evidence from a single vector
Low	< 50%	No significant manipulation detected

🌐 Web GUI Features

Launch the GUI

python start_gui.py

# Custom port
python start_gui.py --port=8080

# Create shareable public link
python start_gui.py --share

GUI Interface

The web GUI provides 4 ways to interact with the system:

1. Single File Analysis Tab

Drag-and-drop audio file upload
Real-time progress updates (Phase 1-5)
Instant results display with HTML formatting
Visual gallery showing all 4 analysis plots
Download buttons for JSON and Markdown reports

2. Batch Processing Tab

Multi-file upload support
Progress tracking for each file
Summary statistics table
CSV export for batch results

3. About & Help Tab

Detection methods explanation
Confidence levels guide
Interpretation tips
Security information

GUI Screenshots

Access at: http://localhost:7860

Features:

🎨 Dark theme interface
📱 Responsive design
⚡ Real-time updates
🔒 Secure (local processing)

💡 Examples

Example 1: Web GUI (Easiest)

python start_gui.py

Then:

Open browser to http://localhost:7860
Drag audio file onto upload area
Click "Analyze Audio"
View results and visualizations
Download reports

Example 2: Basic CLI Analysis

python tui.py analyze suspicious_voice.wav

Output:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
FORENSIC AUDIO ANALYSIS REPORT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ASSET_ID: suspicious_voice
ALTERATION DETECTED: True
CONFIDENCE: 99% (Very High)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
EVIDENCE VECTORS:
  [1] PITCH: Pitch-Formant Incoherence Detected
  [2] TIME: Phase Decoherence / Transient Smearing Detected
  [3] SPECTRAL: Spectral Artifacts Detected
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Example 2: Batch Processing

from pipeline import VoiceManipulationDetector

detector = VoiceManipulationDetector()
reports = detector.batch_analyze(
    audio_dir='./evidence/',
    output_dir='./case_001_results/',
    pattern='*.wav'
)

# Generate summary
manipulated = sum(1 for r in reports if r['ALTERATION_DETECTED'])
print(f"Detected manipulation in {manipulated}/{len(reports)} files")

Example 3: Verification

from verification import OutputVerifier

verifier = OutputVerifier()
result = verifier.verify_report('results/sample_report.json')

if result['valid']:
    print(f"✓ Report verified - Timestamp: {result['timestamp']}")
else:
    print(f"✗ Verification failed: {result['error']}")

Example 4: Export to CSV

from verification import ReportExporter

exporter = ReportExporter()
exporter.export_csv_summary(reports, 'case_summary.csv')

🧪 Testing

Run the comprehensive test suite:

python test_pipeline.py

This will:

Generate 6 synthetic test samples (3 clean + 3 manipulated)
Analyze each sample through the full pipeline
Verify detection accuracy
Test cryptographic verification system
Generate complete reports and visualizations

Expected Output:

================================================================================
TEST RESULTS SUMMARY
================================================================================
Total Tests: 7
Passed: 7 (✓)
Failed: 0 (✓)
Success Rate: 100.0%

📁 Project Structure

AUDIOANALYSISX1/
├── README.md                    # This file
├── setup.py                     # Package installation
├── requirements.txt             # Python dependencies
├── .gitignore                   # Git ignore rules
│
├── docs/                        # 📚 Documentation
│   ├── getting-started.md       # Quick start guide
│   ├── gui-guide.md            # Web GUI guide
│   ├── usage.md                # Usage guide
│   ├── technical.md            # Technical details
│   ├── api-reference.md        # API documentation
│   ├── deployment.md           # Deployment guide
│   └── debug-report.md         # Debug validation
│
├── audioanalysisx1/            # 🔬 Main Package
│   ├── __init__.py
│   ├── pipeline.py             # Main orchestrator
│   ├── verification.py         # Cryptographic verification
│   ├── visualizer.py           # Visualization engine
│   │
│   ├── phases/                 # Detection phases
│   │   ├── baseline.py         # PHASE 1: F0 Analysis
│   │   ├── formants.py         # PHASE 2: Formant Analysis
│   │   ├── artifacts.py        # PHASE 3: Manipulation Detection
│   │   ├── ai_detection.py     # PHASE 4: AI Detection
│   │   └── reporting.py        # PHASE 5: Report Synthesis
│   │
│   ├── voicemod/               # 🎭 Voice Modification System (NEW)
│   │   ├── __init__.py         # Module interface
│   │   ├── realtime.py         # Real-time audio I/O
│   │   ├── processor.py        # Audio processor
│   │   ├── effects.py          # Effect implementations
│   │   ├── presets.py          # Voice presets library
│   │   └── gui.py              # Web GUI for modification
│   │
│   ├── gui/                    # Web GUI (Detection)
│   │   ├── app.py             # Gradio interface
│   │   └── utils.py           # GUI utilities
│   │
│   └── cli/                    # CLI interfaces
│       ├── simple.py           # Simple CLI
│       └── interactive.py      # Interactive TUI
│
├── scripts/                    # 🚀 Executable Scripts
│   ├── start-gui               # Launch detection GUI
│   ├── analyze                 # Simple analysis
│   └── download-samples        # Sample generator
│
├── run_voice_modifier.py       # 🎭 Voice modifier CLI
├── run_voice_modifier_gui.py   # 🎭 Voice modifier GUI
│
├── deepfake_model/             # 🤖 Pre-trained AI model
│
├── tests/                      # 🧪 Test Suite
│   ├── test_pipeline.py        # Pipeline tests
│   ├── validate_system.py      # System validation
│   └── examples.py             # Usage examples
│
└── samples/                    # 🎵 Sample Audio
    ├── README.md
    ├── human/                  # Clean recordings
    ├── tts/                    # AI-generated
    └── manipulated/            # Pitch/time-shifted

🔧 Technical Details

Detection Methods

Pitch-Formant Incoherence
- Compares F0 (fundamental frequency) vs formants (F1, F2, F3)
- Detects physical impossibilities in voice characteristics
- Primary method for pitch-shift detection
Mel Spectrogram Analysis
- Identifies unnatural harmonic structures
- Detects consistent computational noise floor
- Finds spectral discontinuities
Phase Coherence Analysis
- Analyzes STFT phase information
- Detects transient smearing from time-stretching
- Identifies phase discontinuities

Algorithms Used

F0 Extraction: librosa.piptrack with adaptive thresholding
Formant Analysis: Praat Burg algorithm via parselmouth
Phase Analysis: STFT with phase unwrapping
Artifact Detection: Statistical analysis of spectral features

Supported Formats

WAV, MP3, FLAC, OGG, M4A (via librosa)
Sample rates: Any (automatically resampled)
Duration: Up to 10 minutes recommended

🛡️ Security & Privacy

Security Features

✅ Sandboxed execution (no network access required)
✅ Read-only file operations (no modification of source audio)
✅ Cryptographic verification (SHA-256 checksums)
✅ Tamper-evident reports (signed outputs)
✅ Resource limiting (DoS protection)

Privacy Considerations

No audio data is sent to external servers
All processing is local and offline
No personally identifiable information is stored
Original audio files are never modified

Chain of Custody

Each report includes:

Timestamp (UTC)
Audio file hash
Analysis pipeline version
Cryptographic signature

🤝 Use Cases

Detection System - Authorized Applications

✅ Forensic investigations (law enforcement, legal proceedings)
✅ Security testing (authorized penetration testing)
✅ Academic research (voice processing studies)
✅ Quality assurance (detecting processing artifacts)
✅ CTF challenges (cybersecurity competitions)

Voice Modification - Authorized Applications

✅ Privacy protection (whistleblowers, journalists, activists)
✅ Content creation (podcasts, videos, gaming, streaming)
✅ Entertainment (voice acting, character voices)
✅ Research and education (testing detection systems)
✅ Accessibility (voice assistance for medical conditions)
✅ Training (security analyst training, forensic education)

Prohibited Applications (Both Systems)

❌ Unauthorized surveillance or monitoring
❌ Impersonation without consent
❌ Fraud, deception, or illegal activities
❌ Harassment or stalking
❌ Discrimination based on voice characteristics
❌ Violation of platform terms of service

📊 Performance

Benchmarks

Audio Duration	Analysis Time	Memory Usage
3 seconds	~3-5 seconds	~200 MB
30 seconds	~8-12 seconds	~400 MB
3 minutes	~25-35 seconds	~800 MB

Tested on: Intel i7-9750H, 16GB RAM

Optimization Tips

Disable visualizations with --no-viz for faster processing
Use batch mode for multiple files (shared initialization)
Limit audio duration for large files: librosa.load(..., duration=30.0)

🐛 Troubleshooting

Common Issues

Issue: "No module named 'librosa'" Solution: Run pip install -r requirements.txt

Issue: Parselmouth errors on certain files Solution: Ensure audio is valid format (WAV, MP3). Try converting with ffmpeg

Issue: False positives on synthetic/generated audio Expected: Synthetic audio has unnatural characteristics that may trigger detection

Issue: Memory errors on large files Solution: Limit duration: y, sr = librosa.load('file.wav', duration=60.0)

Debug Mode

Enable verbose logging:

import logging
logging.basicConfig(level=logging.DEBUG)

detector = VoiceManipulationDetector()
report = detector.analyze('sample.wav')

🗺️ Roadmap

Current Version: 2.0.0

Planned Features

Real-time stream analysis (for detection)
Machine learning enhancement (optional deepfake detection)
REST API server mode
Docker containerization
GPU acceleration
Additional language support
Mobile app support

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

This implementation is based on the Tactical Implementation Specification (TIS) for forensic voice analysis.

Technologies Used

Librosa - Audio analysis
Praat-Parselmouth - Formant extraction
Rich - Terminal UI
NumPy & SciPy - Scientific computing

📞 Support

For issues, questions, or contributions:

Check the documentation
Review examples
Run the test suite
Consult technical documentation

⚖️ Ethical Use Statement

This tool is designed for authorized security testing, forensic analysis, and research purposes only. Users must:

Obtain proper authorization before analyzing voice recordings
Comply with applicable laws and regulations
Respect privacy and consent requirements
Use results responsibly and ethically

Unauthorized use for surveillance, discrimination, or privacy violation is strictly prohibited.

Built with 🔬 for forensic audio analysis

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
audioanalysisx1		audioanalysisx1
docs		docs
examples		examples
samples		samples
scripts		scripts
tests		tests
.gitignore		.gitignore
API_FEATURES.md		API_FEATURES.md
LICENSE		LICENSE
README.md		README.md
VOICE_MOD_FEATURES.md		VOICE_MOD_FEATURES.md
requirements.txt		requirements.txt
run_api_server.py		run_api_server.py
run_gui.py		run_gui.py
run_voice_modifier.py		run_voice_modifier.py
run_voice_modifier_gui.py		run_voice_modifier_gui.py
setup.py		setup.py

License

SWORDIntel/AUDIOANALYSISX1

Folders and files

Latest commit

History

Repository files navigation