GitHub - GouthamVicky/LLM-LongDoc-Summary: PEFT & LORA finetuning of Falcon LLM model that can generate a summary of a given Research document.

Long Document Summarization

Problem Statement

Develop a customized LLM model that can generate a summary of a given Research document.

Proposed Solution

Preprocess the article's context
Generate an extractive summary using sentence-transformer
Generate a prompt dataset
Fine-tune the Falcon Model (1B)
Evaluate the Model output
Serve the finetuned LLM model using VLLM
Containerize the inference pipeline and deploy the API’s endpoint with Streamlit integrated UI

Arxiv dataset for summarization

Dataset for summarization of long documents.
Huggingface Link - ccdv/arxiv-summarization

Data Fields

id: paper id
article: a string containing the body of the paper
abstract: a string containing the abstract of the paper

Part 1 - Preprocessing Steps

Tokenization: The input document is tokenized into sentences using NLTK's sentence tokenizer. This step divides the document into manageable segments.
Generate Contextual Embeddings: Sentence Transformers are used to create contextual embeddings for each sentence. paraphrase-MiniLM-L3-v2 has been used due to its speed and efficiency, as indicated in the Sentence Transformers documentation
Sentence Clustering: Sentences are clustered using K-means which groups similar sentences together, helping to identify important information from the article.
Representative Sentence Selection: The sentence closest to the centroid of each cluster is chosen as the representation sentence for that group. This ensures that the most informative sentence in each topic cluster is included in the summary.
Create Extractive Summary: The extractive summary is generated by combining these sentences
Generating the Prompt for finetuning: The extractive summary obtained in the previous step is combined with the document's abstract. This combined text serves as the prompt for the language model.

Part 2 - Finetuning Model

Download Falcon Model (1B) Model Source - Paper: https://arxiv.org/abs/2306.01116
using Bitsandbytes load the model in 4bit with nf4 Quantization Technique Paper - https://arxiv.org/pdf/2305.14314.pdf
Import PEFT and pass the Lora config to finetune the Falcon model with the generated prompt. By using QLoRA, the model parameters are frozen, resulting in fine-tuning with a reduced parameter set. This, in turn, leads to reduced GPU consumption and a shortened training time.
Finetune the model and save the adapter/checkpoint to the huggingface hub
Merge the LORA adapter and push the model to huggingface hub.

Train/Loss Graph

Finetuning Notebook - Link

Model Card

Model Card Link - GouthamVignesh/falcon-arxiv-long-summary-1B (Lora adapter & safetensor)

Model Card Link - GouthamVignesh/falcon-long-summary-ckpt (checkpoint bin file)

Model Description

Finetuned Falcon-1B model for document summarization task

Model type: CasualLM
Language(s) (NLP): EN
Finetuned from model: tiiuae/falcon-rw-1b

Part 2 - Evaluvaton

Load Dataset and Initialize Lists.
Initialize Sentence Transformer Model.
Calculate BLEU, ROUGE, and Semantic Similarity Scores.
Iterate and Evaluate Each Example.
Calculate Average BLEU, ROUGE, and Semantic Similarity Scores.
Combine Scores into a Hybrid Score.
Interpret the Results.

Part 3 - Inference Comparision

Inference comparison between hugging face generation pipeline and VLLM Inference
Inference Google Colab Notebook Link

Metric	Hugging Face Pipeline	VLM Pipeline
Total Time Taken	10.04 seconds	15.21 seconds
Tokens Generated	324	1024
Tokens Generated Per Second	32.26 tokens/second	67.34 tokens/second
GPU Memory Usage (MB)	4587	33140 MB

Inference Comparion chart -

Part 4 - Streamlit Model Deployment

Build and run the docker Image to run the streamlit application

git clone https://github.com/GouthamVicky/LLM-LongDoc-Summary.git 
cd inference
sudo docker build -t falconapp .
docker run --gpus all -p 8000:8000 -p 8501:8501 falconapp

To run without Docker

git clone https://github.com/GouthamVicky/LLM-LongDoc-Summary.git 
cd inference
pip install -r requirements.txt
uvicorn fastapi_app:app --port 8000 & streamlit run streamlit_app.py

Navigate to localhost http://localhost:8501/

Sample Demo

Screen.Recording.2023-11-08.at.5.25.13.PM.1.mp4

Future Improvements

Exclude the extractive summary process from the preprocessing step and fine-tune the model with the Extended context window limit using Rotary Embeddings and Flash attention to improve the handling of longer documents. Reference - https://arxiv.org/pdf/2306.15595.pdf
LongLoRA can be used to extend the context length during fine-tuning while maintaining high performance and low complexity. The core idea behind LongLoRA is Shift Short Attention (S2 Attention) which splits the content into smaller parts. S2-Attn introduces token shifting by half of the group size, ensuring smooth information exchange between adjacent groups.Reference - https://github.com/dvlab-research/LongLoRA
Finetune the model which has AWQ support to enhance the inference speed of the model on resource-constrained edge platforms Reference - https://github.com/mit-han-lab/llm-awq#awq-model-zoo

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
Training Notebook		Training Notebook
evaluvation		evaluvation
inference		inference
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Long Document Summarization

Problem Statement

Proposed Solution

Arxiv dataset for summarization

Data Fields

Part 1 - Preprocessing Steps

Part 2 - Finetuning Model

Train/Loss Graph

Finetuning Notebook - Link

Model Card

Model Card Link - GouthamVignesh/falcon-arxiv-long-summary-1B (Lora adapter & safetensor)

Model Card Link - GouthamVignesh/falcon-long-summary-ckpt (checkpoint bin file)

Model Description

Part 2 - Evaluvaton

Part 3 - Inference Comparision

Inference Comparion chart -

Part 4 - Streamlit Model Deployment

Sample Demo

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

GouthamVicky/LLM-LongDoc-Summary

Folders and files

Latest commit

History

Repository files navigation

Long Document Summarization

Problem Statement

Proposed Solution

Arxiv dataset for summarization

Data Fields

Part 1 - Preprocessing Steps

Part 2 - Finetuning Model

Train/Loss Graph

Finetuning Notebook - Link

Model Card

Model Card Link - GouthamVignesh/falcon-arxiv-long-summary-1B (Lora adapter & safetensor)

Model Card Link - GouthamVignesh/falcon-long-summary-ckpt (checkpoint bin file)

Model Description

Part 2 - Evaluvaton

Part 3 - Inference Comparision

Inference Comparion chart -

Part 4 - Streamlit Model Deployment

Sample Demo

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages