🧬 Graph Neural Networks for Molecular Property Prediction

A comprehensive hands-on tutorial implementing Graph Convolutional Networks (GCNs) for molecular property prediction using PyTorch Geometric. This project demonstrates how to predict water solubility of chemical compounds directly from their molecular structure using state-of-the-art graph neural networks.

🚀 Overview

This project provides a complete implementation of Graph Neural Networks for molecular property prediction, specifically focusing on predicting water solubility using the ESOL (Estimated SOLubility) dataset. The notebook covers everything from data preprocessing to model training and evaluation.

🎯 Key Features

Complete GNN Pipeline: From SMILES strings to molecular graphs to predictions
ESOL Dataset: 1,128 compounds with water solubility data from MoleculeNet
Graph Convolutional Networks: Multi-layer GCN implementation using PyTorch Geometric
Molecular Visualization: Interactive molecule rendering with RDKit
Performance Analysis: Training loss visualization and model evaluation

🧪 What You'll Learn

How to convert SMILES representations to molecular graphs
Understanding molecular fingerprints and graph-based representations
Implementing Graph Convolutional Networks for regression tasks
Working with chemical datasets and molecular descriptors
Visualizing molecules and training progress

📊 Dataset

We use the ESOL (Estimated SOLubility) dataset from MoleculeNet:

Size: 1,128 chemical compounds
Task: Regression (water solubility prediction)
Features: 9 node features per atom
Target: Solubility values in mol/L
Format: SMILES strings converted to molecular graphs

"ESOL is a small dataset consisting of water solubility data for 1128 compounds. The dataset has been used to train models that estimate solubility directly from chemical structures (as encoded in SMILES strings)."

🏗️ Model Architecture

Our Graph Convolutional Network consists of:

GCN(
  (initial_conv): GCNConv(9, 64)
  (conv1): GCNConv(64, 64)
  (conv2): GCNConv(64, 64)
  (conv3): GCNConv(64, 64)
  (out): Linear(192, 1)
)

Input: 9-dimensional node features (atomic properties)
Hidden Layers: 4 GCN layers with 64 hidden units each
Global Pooling: Combines mean, max, and add pooling
Output: Single regression value (solubility prediction)

🛠️ Installation & Setup

Prerequisites

Python 3.7+
CUDA-compatible GPU (recommended)
Google Colab or Jupyter Notebook environment

Required Libraries

# Core ML libraries
pip install torch==1.6.0
pip install torchvision==0.7.0

# PyTorch Geometric and dependencies
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv
pip install torch-geometric

# Chemistry libraries
pip install rdkit-pypi

# Visualization and utilities
pip install matplotlib seaborn pandas numpy

Quick Start

Clone the repository:

git clone https://github.com/erfan-nourbakhsh/GCN-PytorchGeometric.git
cd GCN-PytorchGeometric

Open the notebook:

jupyter notebook "GCN-Pytorch Geometric.ipynb"

Run all cells to reproduce the complete pipeline

📁 Project Structure

GCN-PytorchGeometric/
├── GCN-Pytorch Geometric.ipynb    # Main tutorial notebook
├── README.md                      # This file
└── data/                         # Generated during execution
    └── ESOL/                     # ESOL dataset files

🔬 Technical Deep Dive

Why Graph Neural Networks for Molecules?

Traditional approaches using SMILES strings directly as input have limitations:

Grammar Dependency: Models focus on SMILES syntax rather than molecular structure
Non-Unique Representation: Same molecule can have multiple valid SMILES strings
Limited Chemical Understanding: String-based models miss spatial relationships

Graph Neural Networks solve these issues by:

Representing molecules as graphs (atoms = nodes, bonds = edges)
Being invariant to atom ordering and notation variations
Capturing molecular structure and chemical relationships directly

Graph Representation

Each molecule is converted to a graph where:

Nodes: Atoms with features (atomic number, hybridization, etc.)
Edges: Chemical bonds with attributes (bond type, stereochemistry)
Graph-level target: Molecular property (solubility)

📈 Results & Performance

The model achieves competitive performance on the ESOL dataset:

Training: Steady decrease in RMSE loss
Architecture: 4-layer GCN with global pooling
Features: 9-dimensional atomic descriptors
Prediction: Direct molecular property regression

🎨 Visualizations

The notebook includes:

Molecular Structures: Interactive 2D molecule rendering
Training Progress: Loss curves and convergence analysis
Dataset Exploration: Feature distributions and molecular diversity
Graph Properties: Node and edge statistics

🤝 Contributing

Contributions are welcome! Here's how you can help:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Ideas for Contributions

Add more molecular datasets (BACE, Tox21, etc.)
Implement different GNN architectures (GraphSAGE, GAT, etc.)
Add molecular descriptor calculations
Improve visualization capabilities
Add model interpretability features

📚 References & Resources

Key Papers

Libraries & Tools

Tutorials & Guides

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

PyTorch Geometric Team for the excellent graph neural network library
RDKit Community for comprehensive cheminformatics tools
MoleculeNet for providing standardized molecular datasets
Original Research that made this implementation possible

💬 Support

If you find this project helpful:

⭐ Star the repository
🐛 Report issues or bugs
💡 Suggest new features
📖 Share with the community

Ready to dive into molecular machine learning? Open the notebook and start exploring the fascinating world of Graph Neural Networks! 🧬✨

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 Graph Neural Networks for Molecular Property Prediction

🚀 Overview

🎯 Key Features

🧪 What You'll Learn

📊 Dataset

🏗️ Model Architecture

🛠️ Installation & Setup

Prerequisites

Required Libraries

Quick Start

📁 Project Structure

🔬 Technical Deep Dive

Why Graph Neural Networks for Molecules?

Graph Representation

📈 Results & Performance

🎨 Visualizations

🤝 Contributing

Ideas for Contributions

📚 References & Resources

Key Papers

Libraries & Tools

Tutorials & Guides

📄 License

🙏 Acknowledgments

💬 Support

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
GCN-Pytorch Geometric.ipynb		GCN-Pytorch Geometric.ipynb
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🧬 Graph Neural Networks for Molecular Property Prediction

🚀 Overview

🎯 Key Features

🧪 What You'll Learn

📊 Dataset

🏗️ Model Architecture

🛠️ Installation & Setup

Prerequisites

Required Libraries

Quick Start

📁 Project Structure

🔬 Technical Deep Dive

Why Graph Neural Networks for Molecules?

Graph Representation

📈 Results & Performance

🎨 Visualizations

🤝 Contributing

Ideas for Contributions

📚 References & Resources

Key Papers

Libraries & Tools

Tutorials & Guides

📄 License

🙏 Acknowledgments

💬 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages