Skip to content

Latest commit

 

History

History
102 lines (79 loc) · 4.45 KB

File metadata and controls

102 lines (79 loc) · 4.45 KB

🧠 PythonUnpackLLM

AI-Powered Python Bytecode Reverse Engineering Framework

PythonUnpackLLM is an automated reverse-engineering pipeline that reconstructs Python source code from compiled bytecode inside packaged executables.

It combines static bytecode disassembly with local LLM-assisted source reconstruction, designed specifically for:

  • Malware analysis
  • Red-team research
  • Incident response
  • Python packer forensics

Unlike experimental "LLM decompilers", PythonUnpackLLM focuses on stability, scale, and real-world RE workflows.


Why This Tool Exists

Reverse-engineering Python executables traditionally requires:

  1. Extracting .pyc files
  2. Disassembling bytecode
  3. Manually reasoning about logic
  4. Dealing with latest Python version

This is slow and error-prone.

PythonUnpackLLM automates the full pipeline, using AI only at the interpretation stage — while keeping extraction and disassembly fully deterministic.

The LLM is treated as an untrusted analysis component, not a source of truth.


Pipeline Overview

  1. Executable unpacking (PyInstaller detection + extraction)
  2. Recursive .pyc recovery
  3. Native bytecode disassembly (no AI / extra dependencies)
  4. Function boundary reconstruction
  5. LLM-assisted logic reconstruction
  6. Validation + structured output

Usage

Extract PYC from exe

python PythonUnpackLLM.py --path ./target.exe --unpack

Disassemble a single file

python PythonUnpackLLM.py --path file.pyc --asm

Decompile a single file

python PythonUnpackLLM.py --path file.pyc

Decompile entire extracted tree

python PythonUnpackLLM.py --path ./PYZ.pyz_extracted --type folder

Key Features

  • Detects packaging type with auto-aborts unsupported formats (saves time in RE workflows)
  • Built-in PyInstaller Extraction (Integrated pyinstxtractor-ng runner)
  • Recursive Folder Mode
  • Reconstructs functions from bytecode disassembly
  • LLM output is treated as untrusted input. This makes the tool stable even when the model fails.

Use Cases

  • Malware analysis
  • Red team tool reversing
  • IR investigations

Tool Comparison

Capability PythonUnpackLLM uncompyle6 decompyle3 pycdc pyinstxtractor-ng ByteCodeLLM (original concept)
Purpose Full automated RE pipeline Python decompiler Python decompiler C++ Python decompiler PyInstaller extractor AI-assisted bytecode reasoning
Works on EXE directly ✅ Yes (auto-unpack) ❌ No ❌ No ❌ No ⚠ Extract only ❌ No
PyInstaller extraction ✅ Built-in ✅ Yes
Recursive folder processing ✅ Yes
Handles large sample sets ✅ Designed for scale ⚠ Manual workflow ⚠ Manual workflow ⚠ Manual workflow ❌ Extraction only ❌ Research prototype
Uses AI reconstruction ✅ Local LLM ✅ Yes
Deterministic bytecode analysis ✅ Yes ⚠ Partial
Trust model for AI output ✅ Treated as untrusted N/A N/A N/A N/A ❌ Not isolated
Function boundary reconstruction ✅ Yes ⚠ Partial ⚠ Partial ⚠ Partial ⚠ Experimental
Crash-safe pipeline ✅ Yes
Works on obfuscated malware samples ✅ Designed for it ⚠ Often fails ⚠ Often fails ⚠ Often fails ⚠ Experimental
Parallel processing ✅ Yes
Output is structured for analysis ✅ Yes ❌ Raw code ❌ Raw code ❌ Raw code

Traditional Python reverse engineering tools focus only on decompilation.
PythonUnpackLLM focuses on end-to-end automation, combining deterministic bytecode analysis with AI-assisted interpretation - while maintaining reliability required for large-scale reverse engineering workflows.

Credits & Acknowledgements

  • Original Research by CyberArk introducing the original ByteCodeLLM concept
  • pyinstxtractor-ng project for PyInstaller extraction

Disclaimer

This software is provided "as is", without warranty of any kind. This tool is intended for research, defensive security, and reverse-engineering education. Do not analyze software without legal authorization. The author assumes no responsibility for misuse.