A powerful hybrid sentiment analysis system for Vietnamese text using PhoBERT + Rule-based approach, achieving 95.80% accuracy on diverse test cases.
- Hybrid Model: Combines deep learning (PhoBERT) with rule-based analysis for optimal accuracy
- Advanced Preprocessing: Noise removal, teencode normalization, word segmentation
- Streamlit Web App: Beautiful, user-friendly interface
- History Management: SQLite-based storage with export/import capabilities
- Multi-format Export: CSV, JSON, HTML, ICS calendar format
- Comprehensive Testing: 1000+ diverse prompts validation
# Clone repository
git clone https://github.com/d0ngle8k/Extract-Prompt-To-Emotion.git
cd Extract-Prompt-To-Emotion
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # macOS/Linux
# Install dependencies
pip install -r requirements.txt
# Run the app
streamlit run app.py- Fork this repository on GitHub
- Go to Streamlit Cloud
- Connect your GitHub account
- Select this repository
- Deploy! (Streamlit will automatically detect
app.pyas the entry point)
- Free tier limitations: Model loading may take time on first run
- Data persistence: History data doesn't persist between sessions (cloud limitation)
- Memory usage: PhoBERT model requires ~2GB RAM, ensure adequate resources
- Overall Accuracy: 95.80%
- Positive: 85.71%
- Negative: 98.72%
- Neutral: 98.70%
Tested on 1000 diverse Vietnamese prompts including standard text, toxic language, slang, and edge cases.
app.py # Streamlit web interface
├── preprocessing.py # Text cleaning & segmentation
├── phobert_module.py # Hugging Face PhoBERT integration
├── rule_based.py # Lexicon-based sentiment analysis
├── fusion.py # Conditional model fusion
└── db_connector.py # SQLite database operations
Extract-Prompt-To-Emotion/
├── app.py # Main Streamlit application
├── preprocessing.py # Vietnamese text preprocessing
├── phobert_module.py # PhoBERT sentiment analysis
├── rule_based.py # Rule-based sentiment analysis
├── fusion.py # Model fusion logic
├── db_connector.py # Database operations
├── requirements.txt # Python dependencies
├── .streamlit/config.toml # Streamlit configuration
├── test/ # Test suite
│ ├── test_1000_prompts.py
│ └── test_1000_prompts_refined.txt
├── phobert_finetuned/ # Pre-trained model files
└── README.md
streamlit: Web app frameworktransformers: Hugging Face modelstorch: PyTorch for deep learningunderthesea: Vietnamese NLP toolkitpandas: Data manipulationnumpy: Numerical operations
- Classification Tab: Input Vietnamese text and get instant sentiment analysis
- History Tab: View past classifications with export options
- Export Data: Download history in CSV, JSON, HTML, or ICS format
- Import Data: Upload CSV/JSON files to add historical data
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is open source. Feel free to use and modify.
-
PhoBERT by VinAI Research
-
Underthesea Vietnamese NLP toolkit
-
Streamlit for the amazing web app framework
Special thanks: chotxx - Vu Quoc Bao