An LLM based AMA bot, trained on data from my blog, website and Resume.
- Install dependencies
pip install -r requirements.txt- Set up environment variables Put the following env variables (substituting for relevant values) into the .env file at project root.
portfolio_site = 'https://en.wikipedia.org/wiki/Sirius_Black'
blogs_site = 'https://harrypotter.fandom.com/wiki/Sirius_Black'
personal_blogs_site = ''
resume_url='<some path>.pdf'
openai_api_key = ''
- Run the application
-
Ingesting data:
python app/indexing_data.pyThis ingests the data from the sources, storing embeddings in theInMemoryVectorStorefirst. Then the store itself is dumped to file system at the path specified in the.env(default:embeddings_dump.json). -
Checking generation:
python app/retrieve_generate.pyThis loads the embeddings from the file system and uses them to answer questions (RAG workflow). -
Chatting with the bot:
streamlit run app/app_streamlit.pyOpens up a streamlit-UI chat interface on the localhosts
- Deploying on Cloud
-
I tried deploying on Google Cloud through the following steps:
- Build docker image (refer the
Dockerfilein the repo) - Tag the docker image such that it can be pushed to Google Container Registry. It would need to be of the format:
HOST-NAME/PROJECT-ID/REPOSITORY/IMAGE - Push the docker image to Google Container Registry using
docker push <GCloud compatible image-name> - Deploy the app as a container on Google Cloud Run.
- Build docker image (refer the
