Skip to content

End-to-end sentiment analysis of tweets using BERT. Includes preprocessing, training, and evaluation with classification reports, confusion matrices, ROC curves, and word clouds. Demonstrates fine-tuning of transformer models for text classification with modular, reproducible code.

License

Notifications You must be signed in to change notification settings

AmirhosseinHonardoust/Sentiment-Analysis-BERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis with BERT

A deep learning project for sentiment classification of tweets using BERT (Bidirectional Encoder Representations from Transformers). The project includes data preprocessing, vocabulary/tokenizer setup, model training, evaluation, and visualization of results such as confusion matrix, ROC curves, and word clouds.


Features

  • Preprocess tweets with cleaning, tokenization, and splitting into train/val/test sets.
  • Fine-tune bert-base-uncased on sentiment labels (negative, neutral, positive).
  • Track and visualize training & validation loss.
  • Generate classification reports, confusion matrices, ROC curves.
  • Create word clouds for positive and negative predictions.
  • Modular codebase with reproducible pipelines for preprocessing, training, and evaluation.

Project Structure

sentiment-analysis-bert/
├─ data/
│  ├─ train.csv
│  ├─ val.csv
│  └─ test.csv
├─ outputs/
│  ├─ best_model.pt
│  ├─ confusion_matrix.png
│  ├─ training_curves.png
│  ├─ classification_report.txt
│  ├─ wordcloud_negative.png
│  └─ roc_curve.png
├─ src/
│  ├─ preprocess.py
│  ├─ train_lstm.py
│  ├─ train_bert.py
│  ├─ evaluate.py
│  └─ utils.py
└─ README.md

Setup

# Create environment
python -m venv .venv
source .venv/bin/activate    # Linux/Mac
.venv\Scripts\activate       # Windows

pip install -r requirements.txt

Preprocess Data

python src/preprocess.py --input data/tweets_sample.csv --outdir data --val-size 0.15 --test-size 0.15

Train BERT

python src/train_bert.py --train data/train.csv --val data/val.csv  --outdir outputs --epochs 3 --batch-size 16 --lr 2e-5  --model bert-base-uncased --max-len 128

Evaluate

python src/evaluate.py --test data/test.csv --checkpoint outputs  --outdir outputs --wordclouds

Results

  • Classification Report: outputs/classification_report.txt

  • Confusion Matrix:

    confusion_matrix

  • Training Curves:

    training_curves

  • Negative Wordcloud:

    wordcloud_negative

Requirements

  • Python 3.8+
  • PyTorch
  • Transformers (HuggingFace)
  • Scikit-learn, Pandas, Matplotlib, Seaborn
  • WordCloud

Next Steps

  • Expand dataset for more robust evaluation.
  • Try advanced transformer models (RoBERTa, DistilBERT).
  • Apply hyperparameter tuning and cross-validation.
  • Deploy model with FastAPI or Streamlit for interactive demo.

About

End-to-end sentiment analysis of tweets using BERT. Includes preprocessing, training, and evaluation with classification reports, confusion matrices, ROC curves, and word clouds. Demonstrates fine-tuning of transformer models for text classification with modular, reproducible code.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages