This repository contains the implementation of the methods proposed in the paper [PaperName], building upon the codebase from [BaselinePaper and repo].
Install packages under conda environments
conda create -n drugrank python=3.9
conda activate drugrank
conda install -y -c rdkit rdkit=2023.03.3
conda install -y numpy=1.26.0 scipy=1.9.1
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip3 install torch=1.13.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu116
- CCLE: Download CCLE expression data from here and save it as
data/CCLE/CCLE_expression.csv. - Combined: Download the
combined_rnaseq_datafile with gene expression data for cell lines renamed for CTRPv2 from here and save it inCombined/. - Please use the provided CTRPv2 (with adjusted AUCs) and PRISM datasets which are already processed. The processed datasets can be downloaded from here. Unzip the
ctrpv2.zipandprism.zipinside thedatadirectory. - Find detailed instructions to download and process data in
data/README.md.
- Check
scripts/run_ae.shon how to run the pretraining code insrc/train_ae.py. - Each pre-training run will take only a few minutes on a single V100 GPU.
Run the below code to train
export DATA_FOLDER="data/ctrpv2/"
python src/cross_validate.py --model listone --data_path $DATA_FOLDER/LCO/aucs.txt --smiles_path $DATA_FOLDER/cmpd_smiles.txt --splits_path $DATA_FOLDER/LCO/pletorg/ --pretrained_ae -ae_path ${ae_path} -fgen morgan_count --setup LCO
Run the below code to train
export DATA_FOLDER="data/ctrpv2/"
python src/cross_validate.py --model listall --data_path $DATA_FOLDER/LCO/aucs.txt --smiles_path $DATA_FOLDER/cmpd_smiles.txt --splits_path $DATA_FOLDER/LCO/pletorg/ --pretrained_ae -ae_path ${ae_path} -fgen morgan_count -M 0.5 --setup LCO
Run the below code to train
export DATA_FOLDER="data/ctrpv2/"
python src/cross_validate.py --model pairpushc --data_path $DATA_FOLDER/LCO/aucs.txt --smiles_path $DATA_FOLDER/cmpd_smiles.txt --splits_path $DATA_FOLDER/LCO/pletorg/ --pretrained_ae -ae_path ${ae_path} -classc -fgen morgan_count --setup LCO
where ${ae_path} should be the path to the directory containing the saved models.
-
modelspecifies the type of model to train. -
data_pathspecifies the file path containing final processed list of cell ID, drug ID and AUC values (comma-separated). -
smiles_pathspecifies the file path containing the list of tab-separated drug ID and its SMILES string, the SMILES string must be the last column in this file. -
splits_pathspecifies the path to the directory containing the folds, where each fold is saved as a directory. -
ae_pathspecifies the path to the directory containing the pretrained$\mathtt{GeneAE}$ model. - check
utils/args.pyfor other hyper-parameters. - Use
export DATA_FOLDER="data/ctrpv2/"and changeae_indto 17743 andae_pathtodata/Combined/combined_rnaseq_datafor all experiments on CTRP dataset. - Use
export DATA_FOLDER="data/prism/"and changeae_indto 19177 andae_pathtodata/CCLE/CCLE_expression.csvfor all experiments on PRISM dataset. - change the
splits_pathto$DATA_FOLDER/LRO/,data_pathto$DATA_FOLDER/LRO/aucs.txtandsetup=LROfor the LRO experiments.
Check the following scripts for hyper-parameter grid-search and cross-validation:
-
scripts/run_listone.shfor$\mathtt{List\text{-}One}$ . -
scripts/run_listall.shfor$\mathtt{List\text{-}All}$ . -
scripts/run_pairpushc.shfor$\mathtt{Pair\text{-}PushC}$ .