Skip to content

williyam-m/TinyLLM

Repository files navigation

TinyLLM 🤖

A minimal, educational implementation of a GPT-style decoder-only transformer language model built from scratch using PyTorch.

🌟 Features

  • Decoder-only Transformer Architecture (similar to GPT-2/3)
  • Multi-Head Causal Self-Attention with KV-Cache
  • Pre-LayerNorm architecture for stable training
  • BPE Tokenization using tiktoken (GPT-2 tokenizer)
  • Simple CLI for training and inference
  • Clean, modular code for learning and experimentation

📋 Architecture

Input Tokens → Token Embedding + Positional Embedding → Dropout
                                    ↓
                    ┌───────────────────────────────┐
                    │   Transformer Decoder Block   │ × N layers
                    │   ├── LayerNorm               │
                    │   ├── Multi-Head Attention    │
                    │   ├── Residual Connection     │
                    │   ├── LayerNorm               │
                    │   ├── Feed-Forward Network    │
                    │   └── Residual Connection     │
                    └───────────────────────────────┘
                                    ↓
                    Final LayerNorm → LM Head → Output Logits

🚀 Quick Start

1. Setup Environment

# Clone or navigate to the project
cd TinyLLM

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Train the Model

Prepare your training data as a .txt file, then run:

python trainllm.py your_data.txt

Training Options:

python trainllm.py data.txt \
    --model-size small \
    --epochs 5 \
    --batch-size 8 \
    --learning-rate 3e-4 \
    --max-seq-len 256 \
    --output-dir ./checkpoints

3. Start Chat Interface

python tinyllm.py start

Chat Options:

python tinyllm.py start \
    --model ./checkpoints/final \
    --temperature 0.8 \
    --max-tokens 150

4. Generate Text (Non-interactive)

python tinyllm.py generate "Once upon a time" --max-tokens 100

📁 Project Structure

TinyLLM/
├── src/
│   ├── model/
│   │   ├── config.py           # Model configuration
│   │   ├── embeddings.py       # Token & position embeddings
│   │   ├── attention.py        # Multi-head causal attention
│   │   ├── feedforward.py      # Feed-forward network
│   │   ├── transformer_block.py # Decoder block
│   │   └── tinyllm.py          # Main model class
│   ├── tokenizer/
│   │   └── tokenizer.py        # BPE tokenizer (tiktoken)
│   ├── data/
│   │   └── dataset.py          # Dataset classes
│   ├── training/
│   │   └── trainer.py          # Training loop
│   └── inference/
│       └── generate.py         # Text generation
├── trainllm.py                 # Training CLI
├── tinyllm.py                  # Chat CLI
├── requirements.txt
└── README.md

📊 Model Configurations

Size Parameters Layers Heads Hidden FFN
Tiny ~1M 4 4 128 512
Small ~10M 6 8 256 1024
Medium ~45M 8 8 512 2048

🔧 CLI Commands

Training (trainllm.py)

python trainllm.py <data_file> [OPTIONS]

Arguments:
  data_file           Path to training text file

Options:
  -o, --output-dir    Output directory (default: ./checkpoints)
  -m, --model-size    Model size: tiny, small, medium (default: small)
  -e, --epochs        Number of epochs (default: 3)
  -b, --batch-size    Batch size (default: 8)
  -lr, --learning-rate Learning rate (default: 3e-4)
  --max-seq-len       Maximum sequence length (default: 256)
  --device            Device: auto, cuda, mps, cpu (default: auto)

Chat (tinyllm.py)

python tinyllm.py start [OPTIONS]

Options:
  -m, --model         Path to model checkpoint
  -t, --temperature   Sampling temperature (default: 0.8)
  --top-k            Top-K sampling (default: 50)
  --top-p            Top-P sampling (default: 0.9)
  --max-tokens       Max tokens per response (default: 150)

Chat Commands:
  /quit, /exit       Exit chat
  /clear             Clear conversation history
  /temp <value>      Set temperature
  /help              Show help

💡 Example Usage

Training on Custom Data

# Create a sample text file
echo "Hello, I am TinyLLM. I am a small language model.
I can generate text based on patterns I learned during training.
Ask me anything and I will try my best to respond!" > sample.txt

# Train the model
python trainllm.py sample.txt --epochs 10 --model-size tiny

Interactive Chat

$ python tinyllm.py start

🧑 You: Hello, who are you?

🤖 TinyLLM: I am TinyLLM, a small language model trained to have conversations.

🧑 You: What can you do?

🤖 TinyLLM: I can generate text, answer questions, and have conversations!

🧑 You: /quit
Goodbye! 👋

🧠 Key Concepts

1. Causal (Masked) Self-Attention

The model can only attend to previous tokens, not future ones. This is achieved using a lower-triangular mask.

2. Pre-LayerNorm

Layer normalization is applied before each sub-layer (attention, FFN) rather than after, leading to more stable training.

3. Weight Tying

The token embedding matrix is shared with the output projection (LM head), reducing parameters.

4. KV-Cache

During generation, key-value pairs from previous tokens are cached to avoid redundant computation.

📈 Training Tips

  1. Start small: Use --model-size tiny for quick experiments
  2. More data is better: LLMs need lots of text to learn patterns
  3. Adjust learning rate: Lower for larger models (1e-4 to 5e-4)
  4. Use gradient accumulation: --gradient-accumulation 4 to simulate larger batches
  5. Monitor loss: Training loss should decrease steadily

📝 License

MIT License - Feel free to use, modify, and learn from this code!


Built for learning and experimentation. Happy coding! 🚀

About

GPT-style decoder-only transformer language model built from scratch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors