DocuSense is a powerful chat-based application that lets you interact with multiple PDF documents using natural language. Simply upload your PDFs, ask questions, and get intelligent responses generated from the content of your documents.
This project leverages modern language models to extract, process, and provide accurate answers, ensuring that your queries remain contextual to the loaded PDFs.
- 📂 Upload and chat with multiple PDFs simultaneously.
- 🔍 Get context-aware answers based on document content.
- ⚡ Efficient text chunking for better processing.
- 🧠 Embedding-based similarity matching to ensure relevant responses.
- 🌐 Streamlit-based UI for easy interaction.
The workflow of DocuSense can be summarized in these steps:
- PDF Loading – Extracts text content from multiple uploaded PDFs.
- Text Chunking – Breaks down extracted text into smaller, manageable chunks.
- Embeddings Generation – Creates vector representations of text chunks using a language model.
- Similarity Matching – Matches your query with the most semantically similar text chunks.
- Response Generation – Passes relevant chunks to the language model to generate precise answers.
-
Clone the repository:
git clone https://github.com/Prasad998/DocuSense.git cd DocuSense -
Install dependencies:
pip install -r requirements.txt
-
Add your OpenAI API key to a
.envfile in the project root:OPENAI_API_KEY=your_secret_api_key
-
Ensure dependencies are installed and your API key is set up in
.env. -
Start the app using Streamlit:
streamlit run app.py
-
The interface will open in your default browser.
-
Upload one or more PDFs.
-
Ask natural language questions about your documents via the chat interface.
- The app only responds to queries related to the content of the loaded PDFs.
- Make sure your OpenAI API key has sufficient credits/permissions.
