Spectrally-Processing Extraction, Crawling, & Tele-Reconnaissance Archive SPECTRA is an advanced framework for Telegram data collection, network discovery, and forensic-grade archiving with multi-account support, graph-based targeting, and robust OPSEC features.
- π Multi-account & API key rotation with smart, persistent selection and failure detection
- π΅οΈ Proxy rotation for OPSEC and anti-detection
- π Network discovery of connected groups and channels (with SQL audit trail)
- π Graph/network analysis to identify high-value targets
- π Forensic archiving with integrity checksums and sidecar metadata
- π± Topic/thread support for complete conversation capture
- ποΈ SQL database storage for all discovered groups, relationships, and archive metadata
- β‘ Parallel processing leveraging multiple accounts and proxies simultaneously
- π₯οΈ Modern TUI (npyscreen) and CLI, both using the same modular backend
- βοΈ Forwarding Mode: Traverse a series of channels, discover related channels, and download text/archive files with specific rules, using a single API key.
- π‘οΈ Red team/OPSEC features: account/proxy rotation, SQL audit trail, sidecar metadata, persistent stateS
One-command setup and launch:
# Clone and enter directory
git clone https://github.com/SWORDIntel/SPECTRA.git
cd SPECTRA
# Bootstrap (automatic setup + launch)
./bootstrap
# OR use make
make bootstrap
# On subsequent runs, just launch the TUI
make runFor more commands: make help
docs/guides/β Quick start, launcher usage, integration walkthroughs (AGENTS.md,QUICK_START.md, etc.).docs/reference/β Core architecture references, changelog, and database schema.docs/design/β UI mockups and interface planning assets.docs/reports/β Security summaries, integration reports, and executive updates.docs/roadmap/β Long-term initiatives and backlog material (e.g., deduplication plan).docs/research/β Strategic research and enhancement studies.
SPECTRA/
βββ scripts/ β Executable scripts (install, launch, setup)
βββ docs/ β Documentation and guides
βββ data/ β Runtime data (not tracked in git)
βββ src/ β Archived/deprecated code
βββ tgarchive/ β Main Python package
β βββ core/ β Core business logic
β βββ ui/ β User interfaces (TUI)
β βββ services/ β Background services
β βββ utils/ β Utility functions
β βββ db/ β Database layer
β βββ forwarding/ β Message forwarding
β βββ osint/ β Intelligence gathering
βββ tests/ β Test suite
βββ examples/ β Example scripts
βββ bootstrap β Auto-setup entry point (recommended)
βββ Makefile β Common commands (make help)
βββ setup.py β Python package setup
For detailed structure explanation, see PROJECT_STRUCTURE.md
# Clone the repository
git clone https://github.com/SWORDIntel/SPECTRA.git
cd SPECTRA
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install core dependencies (recommended - stable)
pip install telethon rich pillow pandas networkx matplotlib python-magic pyaes pyasn1 rsa feedgen lxml imagehash croniter npyscreen pysocks
# OR install all dependencies (may require system packages)
pip install -r requirements.txt
# Install package in development mode
pip install -e .Current Version: 2025-09-17 (Production Ready)
- β Core System: Fully operational with all syntax errors resolved
- β CLI Interface: 18 commands available and tested
- β Dependencies: Core dependencies installed and verified
- β Architecture: Professional-grade modular design validated
Recent Fixes (2025-09-17):
- Fixed critical Git merge conflicts blocking system startup
- Resolved CLI parser conflicts and syntax errors
- Validated full system functionality and dependency chain
- See CHANGELOG.md for complete details
SPECTRA supports multi-account configuration with automatic account import from gen_config.py (TELESMASHER-compatible) and persistent SQL storage for all operations.
- Visit https://my.telegram.org/apps to register your application
- Create a config file or use the built-in account import:
# Import accounts from gen_config.py
python -m tgarchive accounts --importSPECTRA can be used in several modes:
# Launch the interactive TUI
python -m tgarchive- The TUI supports all major workflows: discovery, network analysis, batch/parallel archiving, and account management.
- All TUI and CLI operations use the same modular, OPSEC-aware backend.
# Import accounts from gen_config.py
python -m tgarchive accounts --import
# List configured accounts and their status
python -m tgarchive accounts --list
# Test all accounts for connectivity
python -m tgarchive accounts --test
# Reset account usage statistics
python -m tgarchive accounts --reset# Discover groups from a seed entity
python -m tgarchive discover --seed @example_channel --depth 2
# Discover from multiple seeds in a file
python -m tgarchive discover --seeds-file seeds.txt --depth 2 --export discovered.txt
# Import existing scan data
python -m tgarchive discover --crawler-dir ./telegram-groups-crawler/# Analyze network from crawler data
python -m tgarchive network --crawler-dir ./telegram-groups-crawler/ --plot
# Analyze network from SQL database
python -m tgarchive network --from-db --export priority_targets.json --top 50# Archive a specific channel
default
python -m tgarchive archive --entity @example_channel# Process multiple groups from file
python -m tgarchive batch --file groups.txt --delay 30
# Process high-priority groups from database
python -m tgarchive batch --from-db --limit 20 --min-priority 0.1SPECTRA supports parallel processing using multiple Telegram accounts and proxies simultaneously, with full SQL-backed state and OPSEC-aware account/proxy rotation:
# Run discovery in parallel across multiple accounts
python -m tgarchive parallel discover --seeds-file seeds.txt --depth 2 --max-workers 4
# Join multiple groups in parallel
python -m tgarchive parallel join --file groups.txt --max-workers 4
# Archive multiple entities in parallel
python -m tgarchive parallel archive --file entities.txt --max-workers 4
# Archive high-priority entities from DB in parallel
python -m tgarchive parallel archive --from-db --limit 20 --min-priority 0.1You can also use the global parallel flag with standard commands:
# Run batch operations in parallel
python -m tgarchive batch --file groups.txt --parallel --max-workers 4
# Run discovery in parallel
python -m tgarchive discover --seeds-file seeds.txt --parallel --max-workers 4This mode is designed for automated traversal and targeted downloading from an initial set of seed channels. It uses a single API key to explore channels, discover new ones through links in messages (up to a defined depth), and download specific file types (text and common archives) into an organized output directory.
Command Structure:
python -m tgarchive forward --channels-file <path_to_channels.txt> --output-dir <path_to_output_directory> [options]Arguments:
--channels-file PATH: Required. Path to a text file containing the initial list of seed channel URLs or IDs (one per line).--output-dir PATH: Required. Directory where downloaded files (intext_files/andarchive_files/subfolders) and theforward_download_log.csvwill be stored.--max-depth INT: Optional. Maximum depth to follow channel links during discovery. Default is 2.--min-files-gateway INT: Optional. Minimum number of files a channel should ideally have to be considered a 'gateway' for focused downloading (Note: current implementation downloads from all accessible discovered channels; this option is for future refinement). Default is 100.
API Key Usage:
Forwarding mode is designed to use a single API key (specifically, the first account configured in your spectra_config.json or imported from gen_config.py) for all its operations. This is to avoid potentially joining the same channel with multiple accounts, which might be undesirable for certain operational goals.
Output Structure:
In the specified output directory, you will find:
text_files/: Contains downloaded plain text files.archive_files/: Contains downloaded archive files (e.g., .zip, .rar) along with their metadata in.jsonsidecar files (e.g.,example.zip.json).forward_download_log.csv: A CSV log detailing every downloaded file, its source channel, message ID, timestamp, and other metadata.
Running Long Forwarding Sessions:
For extended forwarding mode operations, it is highly recommended to use a terminal multiplexer like screen or tmux to ensure the process continues running even if your connection drops.
Example using screen:
- Start a new screen session:
screen -S spectra_forward_session - Run the command:
python -m tgarchive forward --channels-file your_seeds.txt --output-dir ./forward_output - Detach from the session: Press
Ctrl+AthenD. - To reattach later:
screen -r spectra_forward_session
SPECTRA will not install screen or tmux for you. Please install them using your system's package manager if needed (e.g., sudo apt install screen).
SPECTRA now includes advanced capabilities for message deduplication during forwarding and an automated account invitation system for channels discovered in Forwarding Mode.
To prevent redundant information and save on API calls, SPECTRA's forwarding mechanism can now detect and skip messages that have already been processed and forwarded.
Overview:
- When enabled, SPECTRA computes a unique hash for each message's content (text and media attributes) before forwarding.
- These hashes are stored in the local database (
spectra.sqlite3in a table namedforwarded_messages). - If a message's hash is already found in the database or an in-memory cache for the current session, it's considered a duplicate and will not be forwarded to the primary destination.
- Unique messages can optionally be routed to a secondary, specified channel, ensuring this channel only receives content not seen before.
Configuration (spectra_config.json):
Add or modify the forwarding section in your spectra_config.json:
{
"forwarding": {
"enable_deduplication": true,
"secondary_unique_destination": "@your_unique_content_channel"
}
// ... other forwarding settings ...
}enable_deduplication(boolean): Set totrue(default) to enable duplicate detection,falseto disable.secondary_unique_destination(string | null): Optional. The username or ID of a channel where unique messages (those not previously forwarded) will be sent. Ifnullor not provided, unique messages are only sent to the primary destination.
CLI Flags for tgarchive forward:
--enable-deduplication/--disable-deduplication: Overrides theenable_deduplicationsetting fromspectra_config.jsonfor the current command.- Example:
python -m tgarchive forward --origin @source --destination @main_dest --disable-deduplication
- Example:
--secondary-unique-destination <channel_id_or_username>: Specifies the secondary destination for unique messages, overriding the config for the current command.- Example:
python -m tgarchive forward --origin @source --destination @main_dest --secondary-unique-destination @only_uniques_here
- Example:
Usage Example:
To forward messages from @news_source to @my_archive, skip duplicates, and send unique messages also to @special_uniques:
- Ensure your
spectra_config.jsonhas:"forwarding": { "enable_deduplication": true, "secondary_unique_destination": "@special_uniques" }
- Run the command:
Or, using CLI overrides:
python -m tgarchive forward --origin @news_source --destination @my_archive
python -m tgarchive forward --origin @news_source --destination @my_archive --enable-deduplication --secondary-unique-destination @special_uniques
When operating in Forwarding Mode, SPECTRA can now automatically invite other configured accounts to join newly discovered and accessible public channels. This helps distribute channel membership across your available accounts.
Overview:
- After the primary Forwarding Mode account successfully accesses/joins a new channel, that channel is added to an invitation queue.
- Other accounts configured in
spectra_config.json(excluding the primary Forwarding Mode account) will be gradually invited to join these queued channels. - Invitations are processed with randomized delays to simulate natural user behavior and respect Telegram's rate limits.
- The system tracks successful and failed invitations in a state file (
invitation_state.jsonin your forward output directory) to avoid re-processing and to allow resumability.
Configuration (spectra_config.json):
Add or modify the forwarding section in your spectra_config.json:
{
"forwarding": {
"auto_invite_accounts": true,
"invitation_delays": {
"min_seconds": 120,
"max_seconds": 600,
"variance": 0.3
}
}
// ... other forwarding settings ...
}auto_invite_accounts(boolean): Set totrue(default) to enable this feature,falseto disable.invitation_delays: An object defining the timing for invitations:min_seconds(integer): Minimum base delay before an invitation attempt.max_seconds(integer): Maximum base delay.variance(float, 0.0 to 1.0): Percentage of random variance applied to the base delay. For example, 0.3 means +/- 30%.
CLI Flags for tgarchive forward:
--enable-auto-invites/--disable-auto-invites: Overrides theauto_invite_accountssetting fromspectra_config.jsonfor the current forward session.- Example:
python -m tgarchive forward --channels-file seeds.txt --output-dir ./forward_out --disable-auto-invites
- Example:
Usage Example:
To run Forwarding Mode, discover channels, and have your other accounts automatically invited:
- Ensure your
spectra_config.jsonhas multiple accounts configured and theforwardingsection is set up (or use defaults):"accounts": [ {"session_name": "main_forward_acc", "api_id": 123, "api_hash": "abc"}, {"session_name": "invitee_acc1", "api_id": 456, "api_hash": "def"}, {"session_name": "invitee_acc2", "api_id": 789, "api_hash": "ghi"} ], "forwarding": { "auto_invite_accounts": true }
- Run the Forwarding Mode command (the first account,
main_forward_acc, will be used for discovery):Aspython -m tgarchive forward --channels-file initial_seeds.txt --output-dir ./my_forward_data
main_forward_accdiscovers and joins new channels,invitee_acc1andinvitee_acc2will be queued and then invited to join them after randomized delays.
SPECTRA includes powerful features for forwarding messages with attachments from origin channels/chats to a specified destination, or even to the "Saved Messages" of multiple configured accounts. This can be useful for consolidating information, creating backups, or distributing content.
-
Selective Forwarding: Forward messages from a specific origin to a specific destination.
python -m tgarchive forward --origin <origin_id_or_username> --destination <destination_id_or_username>
-
Total Forward Mode: Forward messages from all channels accessible by your configured accounts (as listed in the
account_channel_accesstable) to a specific destination. This mode requires the channel access table to be populated first.python -m tgarchive forward --total-mode [--destination <destination_id_or_username>]
To populate the
account_channel_accesstable, run:python -m tgarchive channels --update-access
The main command for forwarding is python -m tgarchive forward with the following options:
--origin <id_or_username>: Specifies the source channel or chat from which to forward messages. This is required unless--total-modeis used.--destination <id_or_username>: Specifies the target channel or chat to which messages will be forwarded. If not provided, SPECTRA will use thedefault_forwarding_destination_idset in yourspectra_config.jsonfile.--account <phone_or_session_name>: Specifies which configured Telegram account to use for the forwarding operation. If not provided, the first account in your configuration is typically used. For "Total Forward Mode", this account is used for orchestration, while individual channel forwarding uses an account known to have access to that specific channel (from theaccount_channel_accesstable).--total-mode: Enables "Total Forward Mode". When this flag is used, the--originargument is ignored, and SPECTRA will attempt to forward messages from all channels recorded in theaccount_channel_accessdatabase table.--forward-to-all-saved: When enabled, messages successfully forwarded to the main destination will also be forwarded to the "Saved Messages" of every account configured inspectra_config.json. This can be useful for creating broad personal backups but will significantly increase API calls and data redundancy. Use with caution.--prepend-origin-info: If enabled, and if not using topic-based forwarding (see below), information about the original channel (e.g., "[Forwarded from OriginalChannelName (ID: 12345)]") will be prepended to the text of the forwarded message. This helps in identifying the source of messages when they are consolidated into a general channel.
-
Setting Default Destination:
python -m tgarchive config --set-forward-dest <destination_id_or_username>
This command updates the
default_forwarding_destination_idin yourspectra_config.json. -
Viewing Default Destination:
python -m tgarchive config --view-forward-dest
-
Updating Channel Access Data (for Total Mode):
python -m tgarchive channels --update-access
This command populates the
account_channel_accesstable in the database by iterating through all your configured accounts and listing the channels each can access. This table is crucial for the--total-modeforwarding feature.
default_forwarding_destination_id: Located inspectra_config.json, this key (added manually or via theconfig --set-forward-destcommand) allows you to set a global default destination for forwarding operations, so you don't have to specify--destinationevery time.- Supergroup Topic Sorting (Conceptual):
Telegram's "Topics" feature in supergroups allows for organized discussions. SPECTRA's forwarding can conceptually support sending messages into specific topics. This is typically done by forwarding a message as a reply to the message that represents the topic's creation or its main "general" topic message.
If you manually identify the message ID for a specific topic in the destination supergroup, this ID could be used (currently via code modification or future enhancement as
destination_topic_idin theAttachmentForwarder) with thereply_toparameter in Telegram's API when forwarding. Currently, SPECTRA does not automatically create or manage topics by name due to limitations with user accounts (topic creation/management often requires bot privileges or specific admin rights). The--prepend-origin-infoflag is the primary method for distinguishing messages from different origins when forwarded to a common, non-topic-based channel.
Enabling --forward-to-all-saved provides a way to create a distributed backup or personal archive of forwarded content across all your configured Telegram accounts. Each message successfully forwarded to the main destination will also be sent to the "Saved Messages" chat of each account.
Implications:
- Increased API Usage: This feature will make significantly more API calls (one forward per account for each original message). Be mindful of Telegram's rate limits. The system has built-in handling for
FloodWaitError(rate limit exceeded) and will pause as instructed by Telegram, but excessive use could still lead to temporary restrictions on accounts. - Data Redundancy: You will have multiple copies of the forwarded messages across your accounts.
- Sequential Operation: Forwarding to each account's "Saved Messages" happens sequentially for each original message to manage client connections and reduce simultaneous API load from this specific feature.
The "Total Forward Mode" (--total-mode) relies on the account_channel_access table in the SPECTRA database. This table stores a record of which channels are accessible by which of your configured accounts, including their names and access hashes. It is populated by the tgarchive channels --update-access command.
For more details on the database schema, please refer to the DATABASE_SCHEMA.md file.
Shunt Mode is designed to transfer all media files from one Telegram channel (source) to another (destination) with advanced deduplication and file grouping capabilities. This is useful for consolidating archives, moving collections, or reorganizing media across channels.
Key Features:
- Deduplication: Ensures that files already present in the destination (based on content hash) or previously shunted are not transferred again. It uses the same
forwarded_messagestable as the general forwarding feature. - File Grouping: Attempts to identify and transfer related files as groups. This helps maintain the integrity of multi-part archives or collections of images/videos sent together.
- Strategies:
none: No grouping; files are transferred individually.filename: Groups files based on common base names and sequential numbering patterns (e.g.,archive_part1.rar,archive_part2.rarorimage_001.jpg,image_002.jpg).time: Groups files sent by the same user within a configurable time window.
- Strategies:
CLI Command:
The Shunt Mode is activated using specific arguments with the main tgarchive command:
python -m tgarchive --shunt-from <source_id_or_username> --shunt-to <destination_id_or_username> [options]CLI Arguments:
--shunt-from <id_or_username>: Required. The source channel/chat ID or username from which files will be shunted.--shunt-to <id_or_username>: Required. The destination channel/chat ID or username to which files will be transferred.--shunt-account <phone_or_session_name>: Optional. Specifies which configured Telegram account to use for the shunting operation. If not provided, the first available active account from your configuration is typically used.
Configuration (spectra_config.json):
File grouping behavior for Shunt Mode can be configured in your spectra_config.json file under the grouping key:
{
// ... other configurations ...
"grouping": {
"strategy": "filename", // "none", "filename", or "time"
"time_window_seconds": 300 // Time window in seconds for 'time' strategy (e.g., 300 for 5 minutes)
},
// ... other configurations ...
}strategy(string): Defines the grouping method."none"(default): No grouping."filename": Groups based on filename patterns."time": Groups based on time proximity and sender.
time_window_seconds(integer): Relevant only for the"time"strategy. Specifies the maximum time difference (in seconds) between messages from the same sender to be considered part of the same group.
Usage Example:
To shunt all files from @old_archive_channel to @new_consolidated_archive, using filename-based grouping, and specifying the account my_worker_account:
- Ensure your
spectra_config.jsonhas the desired grouping strategy (or rely on defaults):"grouping": { "strategy": "filename" }
- Run the command:
python -m tgarchive --shunt-from @old_archive_channel --shunt-to @new_consolidated_archive --shunt-account my_worker_account
Files will be fetched from the source, grouped according to the strategy, checked for duplicates against the destination (via the shared deduplication database), and then unique files/groups will be forwarded.
A ready-to-use example script is provided to demonstrate parallel discovery, join, and archive operations:
examples/parallel_example.py
# Run parallel discovery, join, and archive from a list of seeds
python examples/parallel_example.py --seeds-file seeds.txt --max-workers 4 --discover --join --archive --export-file discovered.txt- Supports importing accounts from
gen_config.pyautomatically - All operations are SQL-backed and use persistent account/proxy rotation
- Exports discovered groups to a file if requested
- See the script for more advanced usage and options
- Account & API key rotation: Smart, persistent, and SQL-audited
- Proxy rotation: Supports rotating proxies for every operation
- SQL audit trail: All group discovery, joins, and archiving are logged in the database
- Sidecar metadata: Forensic metadata and integrity checksums for all archives
- Persistent state: All operations are resumable and stateful
- Modular backend: All TUI/CLI operations use the same importable modules for maximum reusability
- Detection/OPSEC notes: Designed for red team and forensic use, with anti-detection and audit features
SPECTRA includes advanced AI/ML capabilities for intelligence analysis:
Automatically identifies and scores threat actors on a 1-10 scale based on message content, behavior patterns, and network associations:
# Run the threat scoring demo
python examples/threat_scoring_demo.pyFeatures:
- Multi-factor Scoring: Combines keyword detection (30%), pattern matching (25%), behavioral analysis (20%), network associations (15%), and temporal patterns (10%)
- Threat Classifications: Harmless (1-2), Low Risk (3-4), Medium (5-6), High Risk (7-8), Critical/Nation-State (9-10)
- Network Tracking: "Guilt by association" scoring and community detection
- Mermaid Visualizations: Color-coded network graphs, community clusters, and activity timelines
- Intelligence Reports: Executive summaries with top threat actors and recommended actions
Detection Capabilities:
- 100+ critical security keywords (zero-day, APT groups, ransomware, etc.)
- Pattern matching for CVEs, crypto addresses, malware hashes, Tor addresses, IPs
- Behavioral flags for OPSEC awareness, code sharing, coordinated activity
- Network relationship tracking with 6 interaction types
Usage:
from tgarchive.threat import (
ThreatIndicatorDetector,
ThreatScorer,
ThreatNetworkTracker,
MermaidGenerator
)
# Analyze messages for threat indicators
detector = ThreatIndicatorDetector()
indicators = detector.analyze_message(message_text, message_id)
# Calculate threat scores
threat_score, confidence = ThreatScorer.calculate_threat_score(
keyword_indicators=indicators,
# ... other parameters
)
# Generate network visualization
mermaid_graph = MermaidGenerator.generate_network_graph(
profiles=threat_profiles,
network_tracker=network_tracker
)Documentation:
- Usage Guide: See
docs/THREAT_SCORING_USAGE.md - System Architecture: See
docs/THREAT_SCORING_SYSTEM_PLAN.md - Demo Script: See
examples/threat_scoring_demo.py
Vector-based semantic search with Retrieval-Augmented Generation:
from tgarchive.ai import SemanticSearchEngine
# Initialize search engine
search_engine = SemanticSearchEngine(
model_name="all-MiniLM-L6-v2",
device="cpu"
)
# Index messages
search_engine.index_messages(messages, batch_size=100)
# Search semantically
results = search_engine.search(
query="discuss zero-day vulnerabilities",
top_k=10
)Features:
- Vector embeddings with sentence-transformers
- ChromaDB integration with numpy fallback
- Hybrid semantic + keyword search
- RAG pipeline for question answering
Named Entity Recognition and relationship mapping:
from tgarchive.ai import NERModel, KnowledgeGraph
# Extract entities
ner = NERModel(model_name="en_core_web_lg")
entities = ner.extract_entities(text)
# Build knowledge graph
kg = KnowledgeGraph()
kg.add_entities_from_messages(messages)
influential = kg.pagerank(top_k=100)AI/ML Documentation:
- Full Feature Plan: See
docs/AI_INTELLIGENCE_ENHANCEMENTS.md - Requirements: Install with
pip install -r requirements-ai.txt - Demo: See
examples/ai_features_demo.py
SPECTRA includes enterprise-grade enhancements for high-scale intelligence operations:
Scale to billions of messages with high-dimensional similarity search:
# Install vector database support
pip install -r requirements-advanced.txtFeatures:
- Qdrant/ChromaDB Integration: Production-grade vector storage (384-2048D embeddings)
- Dual-Database Architecture: SQLite for metadata + Vector DB for semantic search
- Semantic Search: Find similar messages across millions of documents
- Actor Similarity: Identify actors with similar behavioral patterns
- Anomaly Detection: Flag messages that don't match normal patterns
- Scalability: Handle billions of vectors with quantization and sharding
Usage:
from tgarchive.db.vector_store import VectorStoreManager, VectorStoreConfig
# Initialize vector store
config = VectorStoreConfig(
backend="qdrant", # or "chromadb", "numpy"
vector_size=384,
distance_metric="cosine"
)
store = VectorStoreManager(config)
# Index messages with embeddings
store.index_message(
message_id=123,
embedding=message_embedding,
metadata={"user_id": 1001, "threat_score": 8.5}
)
# Semantic search
results = store.semantic_search(
query_embedding=query_emb,
top_k=10,
filters={"threat_score": {"gte": 7.0}}
)Post-quantum cryptography for future-proof security:
Standards Compliance:
- ML-KEM-1024 (FIPS 203): Quantum-resistant key encapsulation
- ML-DSA-87 (FIPS 204): Quantum-resistant digital signatures
- SHA-384 (FIPS 180-4): 384-bit secure hashing
- NSA CNSA 2.0 Compliant: Future-proof against quantum attacks
Features:
- Hybrid encryption (ML-KEM-1024 + AES-256-GCM)
- Digital signatures for threat reports (ML-DSA-87)
- Secure key management with encrypted keystore
- Archive encryption with quantum resistance
- Database integrity verification
Usage:
from tgarchive.crypto import CNSA20CryptoManager, CNSAKeyManager
# Initialize crypto manager
crypto = CNSA20CryptoManager()
# Generate quantum-resistant keys
kem_keys = crypto.generate_kem_keypair()
sig_keys = crypto.generate_signature_keypair()
# Encrypt data
encrypted = crypto.encrypt_data(plaintext, recipient_public_key)
# Sign threat reports
signed_report = crypto.create_signed_package(
report_data,
signing_key,
signer_id="analyst_001"
)
# Verify signature
is_valid = crypto.verify_signed_package(signed_report, public_key)
# Secure key management
key_manager = CNSAKeyManager("./keys/keystore.enc")
key_manager.create_keystore(password="SecurePassword")Analyze time-based patterns in threat actor behavior:
Capabilities:
- Activity timeline analysis with burst detection
- Timezone inference from peak activity hours
- Sleep pattern analysis for geolocation
- Campaign periodicity detection
- Predictive activity forecasting
- Coordinated campaign detection
Usage:
from tgarchive.threat.temporal import TemporalAnalyzer
analyzer = TemporalAnalyzer()
# Analyze activity patterns
patterns = analyzer.analyze_activity_patterns(messages)
# Returns: peak_hours, peak_days, inferred_timezone, burst_periods, regularity_score
# Detect coordinated campaigns
campaigns = analyzer.detect_coordinated_campaigns(
actor_messages,
time_window_minutes=30,
min_actors=3
)
# Predict next activity
prediction = analyzer.predict_next_activity(messages, forecast_hours=24)
# Returns: likely_active_hours, probability_by_hour, confidenceCross-platform identity correlation and behavioral analysis:
Features:
- Writing style analysis (stylometry)
- Vocabulary richness and sentence complexity
- Tool/technique fingerprinting (Metasploit, Cobalt Strike, etc.)
- Operational pattern matching (recon β exploit β post-exploit)
- AI-generated content detection
- Cross-account correlation
- Language proficiency assessment
Usage:
from tgarchive.threat.attribution import AttributionEngine
engine = AttributionEngine()
# Analyze writing style
profile = engine.analyze_writing_style(messages)
# Returns: vocabulary_size, avg_sentence_length, technical_density, language
# Find similar actors (potential sock puppets)
similar = engine.find_similar_actors_by_style(
target_profile,
candidate_profiles,
threshold=0.85
)
# Detect tool fingerprints
tools = engine.detect_tool_fingerprints(messages)
# Returns: {"Metasploit": [...matches...], "Cobalt Strike": [...]}
# Detect AI-generated content
ai_analysis = engine.detect_ai_generated_content(messages)
# Returns: ai_likelihood, indicators, confidence
# Correlate accounts
account_clusters = engine.correlate_accounts(profiles, min_similarity=0.85)
# Returns: [[1001, 1005, 1009], [2003, 2015]] # Likely same actor# Run comprehensive demo of all advanced features
python examples/advanced_features_demo.pyDemonstrates:
- Vector database operations (indexing, searching, clustering)
- Post-quantum encryption and digital signatures
- Temporal pattern analysis and prediction
- Attribution analysis and tool fingerprinting
- AI content detection
- Architecture & Planning:
docs/ADVANCED_ENHANCEMENTS_PLAN.md- Vector database evaluation (Qdrant vs ChromaDB vs Milvus)
- CNSA 2.0 implementation strategy
- Enhanced threat tracking roadmap
- Requirements:
requirements-advanced.txt - Demo Script:
examples/advanced_features_demo.py
Leverage 130 TOPS INT8 compute for real-time intelligence at scale:
Capabilities:
- 10,000+ messages/second continuous psycho-forensic analysis
- <10ms actor correlation across 10 million profiles
- Real-time radicalization detection with sub-second alerting
- GPU/NPU acceleration (NVIDIA/Intel/AMD support)
Psycho-Forensic Linguistics:
- Big Five personality profiling (OCEAN model)
- Dark Triad detection (Narcissism, Machiavellianism, Psychopathy)
- Deception detection (hedging, distancing, verbosity)
- Radicalization stage tracking (0-5 scale progression)
- Emotional state analysis (anger, fear, confidence)
Hardware-Accelerated Operations:
- INT8 quantization (4x throughput vs FP32)
- GPU tensor cores for vector similarity
- NPU acceleration for linguistic models
- Batch inference at 100ΞΌs/message latency
Documentation:
- Master Plan:
docs/INT8_ACCELERATION_PLAN.md(13 sections, comprehensive architecture) - Implementation Status:
docs/INT8_IMPLEMENTATION_STATUS.md - Timeline: 10-week phased implementation
- Target Performance: 90% utilization of 130 TOPS, <2% accuracy loss
Deployment Scenarios:
- Workstation: NVIDIA RTX 4070 (150 TOPS) - $600
- Laptop: Intel Core Ultra 7 (134 TOPS combined) - Mobile ops
- Cloud: 4Γ NVIDIA A10G (600+ TOPS) - Auto-scaling
SPECTRA/tgarchive/discovery.py: Integration point for group crawling, network analysis, parallel archiving, and SQL-backed stateSPECTRA/tgarchive/__main__.py: Unified CLI/TUI entry pointexamples/parallel_example.py: Example for parallel, multi-account operations- All modules are importable and can be reused in your own scripts or pipelines
SPECTRA stores all discovery and archiving data in a SQLite database:
- Discovered groups with metadata and priority scores
- Group relationships and network graph data
- Account usage statistics and health metrics
- Archive status tracking
You can specify a custom database path with --db path/to/database.db
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
