The following are the system requirements for the NVIDIA RAG Blueprint.
Ubuntu 22.04 OS
- GPU Driver - 530.30.02 or later
- CUDA version - 12.6 or later
By default, this blueprint deploys the referenced NIM microservices locally. For this, you will require a minimum of:
- 2xH100
- 2xB200
- 3xA100 The blueprint can be also modified to use NIM microservices hosted by NVIDIA in NVIDIA API Catalog.
Following are the hardware requirements for each component. The reference code in the solution (glue code) is referred to as as the "pipeline".
The overall hardware requirements depend on whether you Deploy With Docker Compose or Deploy With Helm Chart or Interact using native python package.
The NIM and hardware requirements only need to be met if you are self-hosting them with default settings of RAG. See Using self-hosted NVIDIA NIM microservices.
- Pipeline operation: 1x L40 GPU or similar recommended. It is needed for Milvus vector store database, as GPU acceleration is enabled by default.
- LLM NIM: NVIDIA llama-3.3-nemotron-super-49b-v1
- For improved paralleled performance, we recommend 8x or more H100s/A100s/B200s for LLM inference.
- Embedding NIM: Llama-3.2-NV-EmbedQA-1B-v2 Support Matrix
- The pipeline can share the GPU with the Embedding NIM, but it is recommended to have a separate GPU for the Embedding NIM for optimal performance.
- Reranking NIM: llama-3_2-nv-rerankqa-1b-v2 Support Matrix
- NVIDIA NIM for Image OCR: baidu/paddleocr
- NVIDIA NIMs for Object Detection: