A deep learning project for sentiment classification of tweets using BERT (Bidirectional Encoder Representations from Transformers). The project includes data preprocessing, vocabulary/tokenizer setup, model training, evaluation, and visualization of results such as confusion matrix, ROC curves, and word clouds.
- Preprocess tweets with cleaning, tokenization, and splitting into train/val/test sets.
- Fine-tune
bert-base-uncasedon sentiment labels (negative,neutral,positive). - Track and visualize training & validation loss.
- Generate classification reports, confusion matrices, ROC curves.
- Create word clouds for positive and negative predictions.
- Modular codebase with reproducible pipelines for preprocessing, training, and evaluation.
sentiment-analysis-bert/
├─ data/
│ ├─ train.csv
│ ├─ val.csv
│ └─ test.csv
├─ outputs/
│ ├─ best_model.pt
│ ├─ confusion_matrix.png
│ ├─ training_curves.png
│ ├─ classification_report.txt
│ ├─ wordcloud_negative.png
│ └─ roc_curve.png
├─ src/
│ ├─ preprocess.py
│ ├─ train_lstm.py
│ ├─ train_bert.py
│ ├─ evaluate.py
│ └─ utils.py
└─ README.md
# Create environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate # Windows
pip install -r requirements.txtpython src/preprocess.py --input data/tweets_sample.csv --outdir data --val-size 0.15 --test-size 0.15python src/train_bert.py --train data/train.csv --val data/val.csv --outdir outputs --epochs 3 --batch-size 16 --lr 2e-5 --model bert-base-uncased --max-len 128python src/evaluate.py --test data/test.csv --checkpoint outputs --outdir outputs --wordclouds- Classification Report:
outputs/classification_report.txt
- Python 3.8+
- PyTorch
- Transformers (HuggingFace)
- Scikit-learn, Pandas, Matplotlib, Seaborn
- WordCloud
- Expand dataset for more robust evaluation.
- Try advanced transformer models (RoBERTa, DistilBERT).
- Apply hyperparameter tuning and cross-validation.
- Deploy model with FastAPI or Streamlit for interactive demo.


