GitHub

Plixer is a generative model for drug-like molecules that bind in protein pockets. Provide a PDB file and a pocket location and the model generates SMILES strings for small molecule binders.

Paper accepted to GenBio @ ICML 2025

Read the preprint here

Installation

Clone the repository:

git clone git@github.com:JudeWells/plixer.git
cd plixer

Install dependencies: It is recommended to use a virtual environment. Plixer was developed and tested using python 3.11

python3.11 -m venv venvPlixer
source venvPlixer/bin/activate
pip install -r requirements.txt

or

pip install -r requirements_no_gpu.txt

Download Model Checkpoints: The necessary model checkpoints are hosted on Hugging Face Hub. Run the following script to download them into the checkpoints/ directory.
```
python download_checkpoints.py
```
Check instalation by running

python inference/generate_smiles_from_pdb.py

Usage

The main use-case for plixer is generating SMILES strings from a PDB file which contains your target protein inference/generate_smiles_from_pdb.py

Secondary use cases include: scoring / ranking molecules from pre-defined list inference/rank_smiles.py

generating structural analogs of a 3D ligand inference/sample_smiles_from_3d_ligand.py

These secondary use cases are not extensively tested, and it is likely that traditional chemoinformatics methods will perform better at these tasks.

Generating novel molecules for a PDB

To see the configurable options for inference run:

python inference/generate_smiles_from_pdb.py --help

Training poc2mol

The original model was trained using bfloat16. If you want to replicate this modify this file in your dock2grid install (optional): venvPlixer/lib/python3.11/site-packages/docktgrid/config.py

DTYPE = torch.bfloat16

python src/train.py experiment=example_train_poc2mol_plinder

Training vox2smiles on zinc molecules

python src/train.py experiment=example_train_vox2smiles_zinc

Fine-tuning Vox2Smiles on Poc2Mol Outputs AND zinc

python src/train.py experiment=train_vox2smiles_combined

This repository contains three models:

Poc2Mol: Generates voxelized ligands from protein voxel inputs
Vox2Smiles: Decodes voxelized ligands into SMILES strings
CombinedProtein2Smiles combines 1 & 2 for end-to-end pipeline

Project Structure

├── configs/                  # Configuration files
│   ├── data/                 # Data configuration
│   ├── model/                # Model configuration
│   └── experiments/          # Launch training runs with this
├── src/                      # Source code
│   ├── data/                 # Data processing modules
│   │   ├── common/           # Common data utilities
│   │   │   ├── tokenizers/   # SMILES tokenizers
│   │   │   └── voxelization/ # Unified voxelization code
│   │   ├── poc2mol/          # Poc2Mol data modules
│   │   ├── vox2smiles/       # Vox2Smiles data modules
│   │   └── poc2smiles/       # Combined data modules
│   ├── models/               # Model definitions
│   └── utils/                # Utility functions
└── scripts/                  # Random scripts

Model Architecture

Poc2Mol

The Poc2Mol model is a 3D U-Net that takes protein voxels as input and generates ligand voxels as output.

Vox2Smiles

The Vox2Smiles model is a Vision Transformer (ViT) that takes ligand voxels as input and generates SMILES strings as output. It uses a transformer encoder to process the voxel patches and a GPT-style decoder to generate the SMILES tokens.

Combined Model

The combined CombinedProtein2Smiles model connects the Poc2Mol and Vox2Smiles models in an end-to-end fashion. It takes protein voxels as input, generates ligand voxels using Poc2Mol, and then generates SMILES strings using Vox2Smiles.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

The voxelization code is based on DockTGrid.

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
configs		configs
data		data
evaluations		evaluations
example_data		example_data
inference		inference
scripts		scripts
src		src
submit_scripts		submit_scripts
.gitignore		.gitignore
.project-root		.project-root
README.md		README.md
download_checkpoints.py		download_checkpoints.py
plixer_ICML_GenBio_2025_version.pdf		plixer_ICML_GenBio_2025_version.pdf
requirements.txt		requirements.txt
requirements_no_gpu.txt		requirements_no_gpu.txt
visualisation.py		visualisation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Usage

Generating novel molecules for a PDB

Training poc2mol

Training vox2smiles on zinc molecules

Fine-tuning Vox2Smiles on Poc2Mol Outputs AND zinc

Project Structure

Model Architecture

Poc2Mol

Vox2Smiles

Combined Model

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

JudeWells/plixer

Folders and files

Latest commit

History

Repository files navigation

Installation

Usage

Generating novel molecules for a PDB

Training poc2mol

Training vox2smiles on zinc molecules

Fine-tuning Vox2Smiles on Poc2Mol Outputs AND zinc

Project Structure

Model Architecture

Poc2Mol

Vox2Smiles

Combined Model

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages