A multimodal expert assistant GPT platform built using RAG+agent. It integrates tools for modalities such as text, images, and audio. Support local deployment and private database construction.
project_display.mp4
1 Basic Function
- Single/multi turn chat
- Multimodal information display and interaction
- Agent
- Tools
- Web searching
- Image generation
- Image caption
- audio-to-text
- text-to-audio
- Video caption
- RAG
- Private database
- Offline deployment
2 Supporting Information Modality
- text
- image
- audio
- video
3 Model Interface API
- ChatGPT
- Dalle
- Google-Search
- BLIP
Project technology stack: Python + torch + langchain + gradio
- Create a virtual environment in Anaconda:
conda create -n agent python=3.10
- Enter the virtual environment and Install related dependency packages:
conda activate agent
pip install -r ./requirements.txt
-
Install the BLIP model locally, open the BLIP website, and download all files to
Models/BLIP. -
Follow the prompts to configure the key for the API that needs to be used in the
.env.
Multi Agent GPT provides UI interface interaction, allowing users to launch agents and achieve intelligent conversations by running the web.py:
python ./web.py
The program will run a local URL: http://XXX. Open using a local browser to see the UI interface:
By integrating the BLIP model, agents can understand image information and provide high-quality dialogue information.
- .env
- Agents/
- openai_agents.py #用来定义基于gpt3.5的agent
- Database/
- Docs/
- Imgs/
- Show/ #存储一些示例图片
- Models
- BLIP #图像理解大模型
- Tools/
- ImageCaption.py #基于BLIP的图像理解工具
- ImageGeneration.py #定义了一个基于openai dalle的文本生成图像的工具
- search.py #基于Google-search的联网搜索工具
- Utils/
- data_io.py
- stdio.py #实现了如何截获当前程序的日志信息,主要是用来获取agent的verbose信息
- utils_image.py #关于图像处理的一些功能函数
- utils_json.py #从已有的log日志信息中提取相关的有用字段(服务stdio)
- python_new_funciton.py #开发过程中的测试文件
- readme.md
- requirements.txt
- web.py #主运行文件

