Plixer is a generative model for drug-like molecules that bind in protein pockets. Provide a PDB file and a pocket location and the model generates SMILES strings for small molecule binders.
Paper accepted to GenBio @ ICML 2025
-
Clone the repository:
git clone git@github.com:JudeWells/plixer.git cd plixer -
Install dependencies: It is recommended to use a virtual environment. Plixer was developed and tested using python 3.11
python3.11 -m venv venvPlixer source venvPlixer/bin/activate pip install -r requirements.txtor
pip install -r requirements_no_gpu.txt -
Download Model Checkpoints: The necessary model checkpoints are hosted on Hugging Face Hub. Run the following script to download them into the
checkpoints/directory.python download_checkpoints.py
-
Check instalation by running
python inference/generate_smiles_from_pdb.py
The main use-case for plixer is generating SMILES strings from a PDB
file which contains your target protein inference/generate_smiles_from_pdb.py
Secondary use cases include:
scoring / ranking molecules from pre-defined list inference/rank_smiles.py
generating structural analogs of a 3D ligand inference/sample_smiles_from_3d_ligand.py
These secondary use cases are not extensively tested, and it is likely that traditional chemoinformatics methods will perform better at these tasks.
To see the configurable options for inference run:
python inference/generate_smiles_from_pdb.py --help
The original model was trained using bfloat16.
If you want to replicate this modify this file in your dock2grid install (optional):
venvPlixer/lib/python3.11/site-packages/docktgrid/config.py
DTYPE = torch.bfloat16python src/train.py experiment=example_train_poc2mol_plinderpython src/train.py experiment=example_train_vox2smiles_zinc
python src/train.py experiment=train_vox2smiles_combinedThis repository contains three models:
- Poc2Mol: Generates voxelized ligands from protein voxel inputs
- Vox2Smiles: Decodes voxelized ligands into SMILES strings
- CombinedProtein2Smiles combines 1 & 2 for end-to-end pipeline
├── configs/ # Configuration files
│ ├── data/ # Data configuration
│ ├── model/ # Model configuration
│ └── experiments/ # Launch training runs with this
├── src/ # Source code
│ ├── data/ # Data processing modules
│ │ ├── common/ # Common data utilities
│ │ │ ├── tokenizers/ # SMILES tokenizers
│ │ │ └── voxelization/ # Unified voxelization code
│ │ ├── poc2mol/ # Poc2Mol data modules
│ │ ├── vox2smiles/ # Vox2Smiles data modules
│ │ └── poc2smiles/ # Combined data modules
│ ├── models/ # Model definitions
│ └── utils/ # Utility functions
└── scripts/ # Random scripts
The Poc2Mol model is a 3D U-Net that takes protein voxels as input and generates ligand voxels as output.
The Vox2Smiles model is a Vision Transformer (ViT) that takes ligand voxels as input and generates SMILES strings as output. It uses a transformer encoder to process the voxel patches and a GPT-style decoder to generate the SMILES tokens.
The combined CombinedProtein2Smiles model connects the Poc2Mol and Vox2Smiles models in an end-to-end fashion. It takes protein voxels as input, generates ligand voxels using Poc2Mol, and then generates SMILES strings using Vox2Smiles.
This project is licensed under the MIT License - see the LICENSE file for details.
- The voxelization code is based on DockTGrid.

