ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β βββ βββ βββββββ βββ βββββββββββββββ ββββ βββββββββββ βββββββ β
β βββ βββββββββββββββββββββββββββββββ βββββ βββββββββββββββββββββ β
β βββ ββββββ βββββββββ ββββββ ββββββββββββββ ββββββ βββ β
β ββββ βββββββ βββββββββ ββββββ ββββββββββββββ ββββββ βββ β
β βββββββ ββββββββββββββββββββββββββββ βββ βββ βββββββββββββββββββ β
β βββββ βββββββ βββ βββββββββββββββ βββ ββββββββββ βββββββ β
β β
β FORENSIC AUDIO MANIPULATION DETECTION SYSTEM β
β Tactical Implementation Specification β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
A comprehensive system for voice manipulation detection AND real-time voice modification with forensic-grade analysis.
Features β’ Voice Modification β’ Installation β’ Quick Start β’ Documentation β’ Examples
This system provides comprehensive voice manipulation capabilities with both detection and modification features:
Detects voice manipulation and AI-generated voices, including:
- Pitch-shifting (male β female voice conversion)
- Time-stretching (speed manipulation)
- Phase vocoder artifacts (deepfake/alteration signatures)
- Combined manipulations (multi-vector attacks)
- AI-generated voices (TTS, voice cloning, deepfakes)
- Neural vocoder detection (WaveNet, WaveGlow, HiFi-GAN)
Uses multiple independent detection methods to provide high-confidence results with cryptographically verifiable outputs.
Transform voices in real-time for legitimate purposes:
- Gender transformation (male β female voice conversion)
- Character voices (robot, alien, demon, chipmunk, giant)
- Anonymization (privacy protection presets)
- Utility effects (whisper, megaphone, telephone, cave)
- Custom parameters (pitch, formant, time stretch, reverb, echo)
- Live audio I/O with low latency (~43ms at 48kHz)
Perfect for privacy protection, content creation, research, and testing detection systems.
Voice manipulators typically alter pitch (F0) to change perceived gender, but they cannot easily change formants (physical vocal tract resonances). This creates a detectable pitch-formant incoherence that serves as forensic evidence.
CLEAN AUDIO: F0 = 120 Hz (Male) β + Formants = Male β β COHERENT
MANIPULATED: F0 = 220 Hz (Female) β + Formants = Male β β INCOHERENT β
- PHASE 1: Baseline F0 Analysis - Isolates presented pitch
- PHASE 2: Vocal Tract Analysis - Extracts physical formant characteristics
- PHASE 3: Manipulation Artifact Detection - Three independent methods:
- π΅ Pitch-Formant Incoherence Detection
- π Mel Spectrogram Artifact Analysis
- β‘ Phase Decoherence / Transient Smearing Detection
- PHASE 4: AI Voice Detection - Advanced detection using a pre-trained Wav2Vec2 model.
- PHASE 5: Report Synthesis - Generates verified, tamper-evident reports
Modern web-based interface with:
- π±οΈ Drag-and-drop file upload
- π Real-time visualization updates
- π₯ One-click report downloads (JSON, Markdown, CSV)
- π Batch processing with progress bars
- π¨ Dark theme with responsive layout
- π Shareable links for demos
Beautiful terminal interface with:
- Real-time progress tracking
- Color-coded results
- Interactive menus
- Batch processing support
Every analysis includes:
- SHA-256 checksums of audio files
- Cryptographic signatures for tamper detection
- Chain of custody metadata
- Multiple output formats (JSON, Markdown, visualizations)
Generates 4 plots per analysis:
- Overview dashboard
- Mel spectrogram with artifact annotations
- Phase coherence analysis
- Pitch-formant comparison chart
In addition to detection capabilities, this system now includes real-time voice modification/obfuscation for legitimate purposes such as privacy protection, content creation, and testing detection systems.
The voice modification system provides low-latency, real-time audio processing with:
- Live Audio I/O - Real-time microphone input and speaker output
- Multiple Effect Types - Pitch shifting, formant shifting, time stretching, reverb, echo
- Preset Library - 15+ pre-configured voice transformations
- Custom Controls - Fine-tune all parameters in real-time
- Low Latency - ~43ms processing latency at 48kHz
- Professional Quality - Broadcast-ready audio processing
- male_to_female - Transform male voice to female voice
- female_to_male - Transform female voice to male voice
- male_to_female_subtle - Subtle male to female transformation
- female_to_male_subtle - Subtle female to male transformation
- chipmunk - High-pitched cartoon voice
- giant - Deep, slow voice
- robot - Robotic/synthetic voice
- demon - Deep, reverberant voice
- alien - Otherworldly voice
- whisper - Quiet, breathy voice
- megaphone - Loud, compressed voice
- telephone - Phone line quality
- cave - Large reverberant space
- anonymous_1 - Voice anonymization (subtle)
- anonymous_2 - Voice anonymization (moderate)
- anonymous_3 - Voice anonymization (heavy)
python run_voice_modifier_gui.py
# Custom port
python run_voice_modifier_gui.py --port 7861
# Public share link
python run_voice_modifier_gui.py --shareOpens a web interface at http://localhost:7861 with:
- ποΈ Real-time controls for all parameters
- π Preset selector with all voice transformations
- π Live level meters for input/output monitoring
- π Device selection for audio input/output
- β‘ Instant preview of voice modifications
# List available audio devices
python run_voice_modifier.py --list-devices
# List available presets
python run_voice_modifier.py --list-presets
# Use a preset
python run_voice_modifier.py --preset male_to_female
# Custom settings
python run_voice_modifier.py --pitch 6 --formant 1.15
# Specify devices
python run_voice_modifier.py --preset robot --input-device 1 --output-device 2
# Start in bypass mode (no processing)
python run_voice_modifier.py --bypassfrom audioanalysisx1.voicemod import VoiceModifier, AudioProcessor, PRESET_LIBRARY
# Create modifier with configuration
config = AudioConfig(sample_rate=48000, block_size=2048)
modifier = VoiceModifier(config)
# Create processor and apply preset
processor = AudioProcessor()
processor.apply_preset_by_name('male_to_female')
# Add processor and start
modifier.add_effect(processor)
modifier.start()
# Modify settings in real-time
processor.set_pitch(8.0) # 8 semitones up
processor.set_formant(1.2) # 20% higher formants
# Stop when done
modifier.stop()| Parameter | Range | Description |
|---|---|---|
| Pitch | -12 to +12 semitones | Shift fundamental frequency |
| Formant | 0.5 to 2.0 ratio | Shift vocal tract resonances |
| Time Stretch | 0.5 to 2.0x | Change speaking speed |
| Reverb | 0.0 to 1.0 wet mix | Add room reverberation |
| Echo | 0.0 to 1.0 wet mix | Add delayed repetitions |
| Noise Gate | On/Off | Remove background noise |
| Compression | On/Off | Normalize volume levels |
The voice modification system is designed for legitimate purposes only:
- Privacy protection and anonymization
- Entertainment and gaming
- Content creation and podcasting
- Research and development
- Testing detection systems (like this one!)
- Accessibility features
- Impersonation without consent
- Fraud or deception
- Harassment or abuse
- Illegal activities
- Violation of platform terms of service
By using this software, you agree to use it responsibly and in accordance with all applicable laws and regulations.
- Sample Rates: 44.1kHz, 48kHz (configurable)
- Block Size: 1024-4096 samples (configurable)
- Latency: ~43ms at 48kHz with 2048 block size
- Bit Depth: 32-bit float processing
- Supported Devices: All ASIO, CoreAudio, and ALSA compatible devices
The voice modification system is intentionally designed to be detectable by this analysis pipeline:
# Create modified audio
modifier = VoiceModifier()
processor = AudioProcessor()
processor.apply_preset_by_name('male_to_female')
# ... record modified audio ...
# Analyze it
detector = VoiceManipulationDetector()
report = detector.analyze('modified_audio.wav')
# Should detect manipulation
assert report['alteration_detected'] == TrueThis makes the system ideal for:
- Testing detection algorithms
- Training forensic analysts
- Demonstrating manipulation artifacts
- Security research and education
- Python 3.10 or higher
- pip package manager
- 4GB RAM minimum
# Clone or navigate to the project directory
cd /home/john/voice
# Install dependencies
pip install -r requirements.txtlibrosa>=0.10.0 # Audio analysis
numpy>=1.24.0 # Numerical computing
scipy>=1.10.0 # Signal processing
matplotlib>=3.7.0 # Visualizations
praat-parselmouth>=0.4.3 # Formant extraction
soundfile>=0.12.0 # Audio I/O
rich>=13.0.0 # Terminal UI
click>=8.1.0 # CLI framework
python scripts/start-gui
# or
python run_gui.pyOpens a beautiful web interface at http://localhost:7860 with:
- π±οΈ Drag-and-drop file upload
- π Real-time visualizations
- π₯ Download JSON/Markdown reports
- π Batch processing with progress tracking
- π¨ Modern UI with dark theme
Perfect for: Visual analysis, presentations, non-technical users
# Analyze a single file
python scripts/analyze suspicious_call.wav
# Batch process a directory
python scripts/analyze --batch ./audio_samples/ -o ./results/
# Faster (no visualizations)
python scripts/analyze sample.wav --no-vizPerfect for: Quick analysis, scripting, automation
python -m audioanalysisx1.cli.interactiveTerminal-based menu interface with full features.
Perfect for: Server environments, SSH sessions
from audioanalysisx1.pipeline import VoiceManipulationDetector
detector = VoiceManipulationDetector()
report = detector.analyze('sample.wav', output_dir='results/')
# Check results
if report['alteration_detected']:
print(f"β MANIPULATION DETECTED")
confidence = report['confidence']
print(f"Confidence: {confidence['score']:.0%} ({confidence['label']})")
else:
print(f"β No manipulation detected")Perfect for: Integration, custom workflows, automation
All documentation is in the docs/ directory:
- Getting Started - Quick start guide
- GUI Guide - Web interface guide
- Usage Guide - Comprehensive usage
- Technical Docs - Implementation details
- API Reference - Complete API docs
- Deployment - Production deployment
- Debug Report - System validation
Each analysis generates a comprehensive report:
{
"asset_id": "sample_001",
"alteration_detected": true,
"confidence": {
"score": 0.99,
"label": "Very High"
},
"presented_sex": "Female",
"probable_sex": "Male",
"f0_baseline": "221.5 Hz",
"evidence": {
"pitch": "Pitch-Formant Incoherence Detected...",
"time": "Phase Decoherence / Transient Smearing Detected...",
"spectral": "Spectral Artifacts Detected...",
"ai": "No AI voice artifacts detected"
},
"verification": {
"file_hash_sha256": "7bd4d4ce92be3174...",
"report_hash_sha256": "6e5edefb6fd84dc9...",
"timestamp_utc": "2025-10-29T23:05:08Z"
}
}| Level | Score Range | Description |
|---|---|---|
| Very High | >= 90% | Multiple independent confirmations |
| High | 75% - 89% | Strong evidence from multiple vectors |
| Medium | 50% - 74% | Moderate evidence from a single vector |
| Low | < 50% | No significant manipulation detected |
python start_gui.py
# Custom port
python start_gui.py --port=8080
# Create shareable public link
python start_gui.py --shareThe web GUI provides 4 ways to interact with the system:
- Drag-and-drop audio file upload
- Real-time progress updates (Phase 1-5)
- Instant results display with HTML formatting
- Visual gallery showing all 4 analysis plots
- Download buttons for JSON and Markdown reports
- Multi-file upload support
- Progress tracking for each file
- Summary statistics table
- CSV export for batch results
- Detection methods explanation
- Confidence levels guide
- Interpretation tips
- Security information
Access at: http://localhost:7860
Features:
- π¨ Dark theme interface
- π± Responsive design
- β‘ Real-time updates
- π Secure (local processing)
python start_gui.pyThen:
- Open browser to http://localhost:7860
- Drag audio file onto upload area
- Click "Analyze Audio"
- View results and visualizations
- Download reports
python tui.py analyze suspicious_voice.wavOutput:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
FORENSIC AUDIO ANALYSIS REPORT
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ASSET_ID: suspicious_voice
ALTERATION DETECTED: True
CONFIDENCE: 99% (Very High)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
EVIDENCE VECTORS:
[1] PITCH: Pitch-Formant Incoherence Detected
[2] TIME: Phase Decoherence / Transient Smearing Detected
[3] SPECTRAL: Spectral Artifacts Detected
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
from pipeline import VoiceManipulationDetector
detector = VoiceManipulationDetector()
reports = detector.batch_analyze(
audio_dir='./evidence/',
output_dir='./case_001_results/',
pattern='*.wav'
)
# Generate summary
manipulated = sum(1 for r in reports if r['ALTERATION_DETECTED'])
print(f"Detected manipulation in {manipulated}/{len(reports)} files")from verification import OutputVerifier
verifier = OutputVerifier()
result = verifier.verify_report('results/sample_report.json')
if result['valid']:
print(f"β Report verified - Timestamp: {result['timestamp']}")
else:
print(f"β Verification failed: {result['error']}")from verification import ReportExporter
exporter = ReportExporter()
exporter.export_csv_summary(reports, 'case_summary.csv')Run the comprehensive test suite:
python test_pipeline.pyThis will:
- Generate 6 synthetic test samples (3 clean + 3 manipulated)
- Analyze each sample through the full pipeline
- Verify detection accuracy
- Test cryptographic verification system
- Generate complete reports and visualizations
Expected Output:
================================================================================
TEST RESULTS SUMMARY
================================================================================
Total Tests: 7
Passed: 7 (β)
Failed: 0 (β)
Success Rate: 100.0%
AUDIOANALYSISX1/
βββ README.md # This file
βββ setup.py # Package installation
βββ requirements.txt # Python dependencies
βββ .gitignore # Git ignore rules
β
βββ docs/ # π Documentation
β βββ getting-started.md # Quick start guide
β βββ gui-guide.md # Web GUI guide
β βββ usage.md # Usage guide
β βββ technical.md # Technical details
β βββ api-reference.md # API documentation
β βββ deployment.md # Deployment guide
β βββ debug-report.md # Debug validation
β
βββ audioanalysisx1/ # π¬ Main Package
β βββ __init__.py
β βββ pipeline.py # Main orchestrator
β βββ verification.py # Cryptographic verification
β βββ visualizer.py # Visualization engine
β β
β βββ phases/ # Detection phases
β β βββ baseline.py # PHASE 1: F0 Analysis
β β βββ formants.py # PHASE 2: Formant Analysis
β β βββ artifacts.py # PHASE 3: Manipulation Detection
β β βββ ai_detection.py # PHASE 4: AI Detection
β β βββ reporting.py # PHASE 5: Report Synthesis
β β
β βββ voicemod/ # π Voice Modification System (NEW)
β β βββ __init__.py # Module interface
β β βββ realtime.py # Real-time audio I/O
β β βββ processor.py # Audio processor
β β βββ effects.py # Effect implementations
β β βββ presets.py # Voice presets library
β β βββ gui.py # Web GUI for modification
β β
β βββ gui/ # Web GUI (Detection)
β β βββ app.py # Gradio interface
β β βββ utils.py # GUI utilities
β β
β βββ cli/ # CLI interfaces
β βββ simple.py # Simple CLI
β βββ interactive.py # Interactive TUI
β
βββ scripts/ # π Executable Scripts
β βββ start-gui # Launch detection GUI
β βββ analyze # Simple analysis
β βββ download-samples # Sample generator
β
βββ run_voice_modifier.py # π Voice modifier CLI
βββ run_voice_modifier_gui.py # π Voice modifier GUI
β
βββ deepfake_model/ # π€ Pre-trained AI model
β
βββ tests/ # π§ͺ Test Suite
β βββ test_pipeline.py # Pipeline tests
β βββ validate_system.py # System validation
β βββ examples.py # Usage examples
β
βββ samples/ # π΅ Sample Audio
βββ README.md
βββ human/ # Clean recordings
βββ tts/ # AI-generated
βββ manipulated/ # Pitch/time-shifted
-
Pitch-Formant Incoherence
- Compares F0 (fundamental frequency) vs formants (F1, F2, F3)
- Detects physical impossibilities in voice characteristics
- Primary method for pitch-shift detection
-
Mel Spectrogram Analysis
- Identifies unnatural harmonic structures
- Detects consistent computational noise floor
- Finds spectral discontinuities
-
Phase Coherence Analysis
- Analyzes STFT phase information
- Detects transient smearing from time-stretching
- Identifies phase discontinuities
- F0 Extraction:
librosa.piptrackwith adaptive thresholding - Formant Analysis: Praat Burg algorithm via
parselmouth - Phase Analysis: STFT with phase unwrapping
- Artifact Detection: Statistical analysis of spectral features
- WAV, MP3, FLAC, OGG, M4A (via librosa)
- Sample rates: Any (automatically resampled)
- Duration: Up to 10 minutes recommended
- β Sandboxed execution (no network access required)
- β Read-only file operations (no modification of source audio)
- β Cryptographic verification (SHA-256 checksums)
- β Tamper-evident reports (signed outputs)
- β Resource limiting (DoS protection)
- No audio data is sent to external servers
- All processing is local and offline
- No personally identifiable information is stored
- Original audio files are never modified
Each report includes:
- Timestamp (UTC)
- Audio file hash
- Analysis pipeline version
- Cryptographic signature
- β Forensic investigations (law enforcement, legal proceedings)
- β Security testing (authorized penetration testing)
- β Academic research (voice processing studies)
- β Quality assurance (detecting processing artifacts)
- β CTF challenges (cybersecurity competitions)
- β Privacy protection (whistleblowers, journalists, activists)
- β Content creation (podcasts, videos, gaming, streaming)
- β Entertainment (voice acting, character voices)
- β Research and education (testing detection systems)
- β Accessibility (voice assistance for medical conditions)
- β Training (security analyst training, forensic education)
- β Unauthorized surveillance or monitoring
- β Impersonation without consent
- β Fraud, deception, or illegal activities
- β Harassment or stalking
- β Discrimination based on voice characteristics
- β Violation of platform terms of service
| Audio Duration | Analysis Time | Memory Usage |
|---|---|---|
| 3 seconds | ~3-5 seconds | ~200 MB |
| 30 seconds | ~8-12 seconds | ~400 MB |
| 3 minutes | ~25-35 seconds | ~800 MB |
Tested on: Intel i7-9750H, 16GB RAM
- Disable visualizations with
--no-vizfor faster processing - Use batch mode for multiple files (shared initialization)
- Limit audio duration for large files:
librosa.load(..., duration=30.0)
Issue: "No module named 'librosa'"
Solution: Run pip install -r requirements.txt
Issue: Parselmouth errors on certain files
Solution: Ensure audio is valid format (WAV, MP3). Try converting with ffmpeg
Issue: False positives on synthetic/generated audio Expected: Synthetic audio has unnatural characteristics that may trigger detection
Issue: Memory errors on large files
Solution: Limit duration: y, sr = librosa.load('file.wav', duration=60.0)
Enable verbose logging:
import logging
logging.basicConfig(level=logging.DEBUG)
detector = VoiceManipulationDetector()
report = detector.analyze('sample.wav')- Multi-phase detection pipeline
- Interactive TUI
- Cryptographic verification
- Comprehensive visualizations
- Batch processing
- Test suite
- Real-time voice modification system (NEW in v2.0)
- 15+ voice transformation presets (NEW in v2.0)
- Low-latency audio processing (NEW in v2.0)
- Web GUI for voice modification (NEW in v2.0)
- Real-time stream analysis (for detection)
- Machine learning enhancement (optional deepfake detection)
- REST API server mode
- Docker containerization
- GPU acceleration
- Additional language support
- Mobile app support
This project is licensed under the MIT License - see the LICENSE file for details.
This implementation is based on the Tactical Implementation Specification (TIS) for forensic voice analysis.
- Librosa - Audio analysis
- Praat-Parselmouth - Formant extraction
- Rich - Terminal UI
- NumPy & SciPy - Scientific computing
For issues, questions, or contributions:
- Check the documentation
- Review examples
- Run the test suite
- Consult technical documentation
This tool is designed for authorized security testing, forensic analysis, and research purposes only. Users must:
- Obtain proper authorization before analyzing voice recordings
- Comply with applicable laws and regulations
- Respect privacy and consent requirements
- Use results responsibly and ethically
Unauthorized use for surveillance, discrimination, or privacy violation is strictly prohibited.
Built with π¬ for forensic audio analysis