docs: update aks guide (#3651)

sozercan · athreesh · web-flow · commit a6a4f360d270 · 2025-12-03T02:37:16.000Z
Signed-off-by: Sertac Ozercan &lt;sozercan@gmail.com&gt;
Co-authored-by: Anish &lt;80174047+athreesh@users.noreply.github.com&gt;
diff --git a/examples/deployments/AKS/AKS-deployment.md b/examples/deployments/AKS/AKS-deployment.md
@@ -1,200 +1,78 @@
 # Dynamo on AKS
 
+This guide covers deploying Dynamo and running LLM inference on Azure Kubernetes Service (AKS). You'll learn how to set up an AKS cluster with GPU nodes, install required components, and deploy your first model.
 
-This document covers the process of deploying Dynamo Cloud and running inference in a vLLM distributed runtime within a Azure Kubernetes environment, covering the setup process on a Azure Kubernetes Cluster, all the way from setup to testing inference.
+## Prerequisites
 
+Before you begin, ensure you have:
 
-### Task 1. Infrastructure Deployment
+- An active Azure subscription
+- Sufficient Azure quota for GPU VMs
+- [kubectl](https://kubernetes.io/docs/tasks/tools/) installed
+- [Helm](https://helm.sh/docs/intro/install/) installed
 
-1. Open **Azure Cloud Shell** or a ternimal on an Azure VM and install pre-reqs:
-```
-az login
-
-az extension add --name aks-preview
-az extension update --name aks-preview
-```
-
-generate an rsa ssh key for using with aks cluster:
-```
-ssh-keygen -t rsa -b 4096 -C "<email@id.com>"
-```
-
-2. Create AKS Cluster
-  ```
-  export REGION=<region>
-  export RESOURCE_GROUP=<rg_name>
-  export ZONE=<zone>
-  export CLUSTER_NAME=<aks_cluster_name>
-  export CPU_COUNT=1
-
-az aks create -g  $RESOURCE_GROUP -n $CLUSTER_NAME --location $REGION --zones $ZONE --node-count $CPU_COUNT --enable-node-public-ip --ssh-key-value /home/user/.ssh/id_rsa.pub
-```
-
-3. Check if it was created correctly
-``` bash
-# Get Credentials
-az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME
-
-kubectl config get-contexts
-
-#You should see output like this:
-CURRENT   NAME         CLUSTER      AUTHINFO                                   NAMESPACE
-*         dynamo-aks   dynamo-aks   clusterUser_<rg_name>_<aks_cluster_name>
-```
-
-4. Create GPU node pool: You can use as many computes of whatever SKU you want, here we have used 4 nodes of standard_nc24ads_a100_v4, which have 1 A100 each.
-```
-az aks nodepool add --resource-group $RESOURCE_GROUP --cluster-name $CLUSTER_NAME --name gpupool --node-count 4 --skip-gpu-driver-install --node-vm-size standard_nc24ads_a100_v4 --node-osdisk-size 2048 --max-pods 110
-```
-
-### Task 2. Install Nvidia GPU Operator
-
-Once your AKS cluster is configured with a GPU-enabled node pool, we can proceed with setting up the NVIDIA GPU Operator. This operator automates the deployment and lifecycle of all NVIDIA software components required to provision GPUs in the Kubernetes cluster. The NVIDIA GPU operator enables the infrastructure to support GPU workloads like LLM inference and embedding generation.
-
-1. Add the NVIDIA Helm repository:
-```
-helm repo add nvidia https://helm.ngc.nvidia.com/nvidia --pass-credentials && helm repo update
-```
-
-2. Install the GPU Operator:
-```
-helm install --create-namespace --namespace gpu-operator nvidia/gpu-operator --wait --generate-name
-```
+## Step 1: Create AKS Cluster with GPU Nodes
 
-3. Validate install (Takes about 5 mins to complete):
-```
-kubectl get pods -A -o wide
-```
+If you don't have an AKS cluster yet, create one using the [Azure CLI](https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-cli), [Azure PowerShell](https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-powershell), or the [Azure portal](https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-portal).
 
-You should see output similar to the example below. Note that this is not the complete output, there should be additional pods running. The most important thing is to verify that the GPU Operator pods are in a `Running` state.
+Ensure your AKS cluster has a node pool with GPU-enabled nodes. Follow the [Use GPUs for compute-intensive workloads on Azure Kubernetes Service (AKS)](https://learn.microsoft.com/en-us/azure/aks/use-nvidia-gpu?tabs=add-ubuntu-gpu-node-pool#skip-gpu-driver-installation) guide to create a GPU-enabled node pool.
 
-```
-NAMESPACE     NAME                                                          READY   STATUS    RESTARTS   AGE   IP             NODE
-gpu-operator  gpu-operator-xxxx-node-feature-discovery-gc-xxxxxxxxx         1/1     Running   0          40s   10.244.0.194   aks-nodepool1-xxxx
-gpu-operator  gpu-operator-xxxx-node-feature-discovery-master-xxxxxxxxx     1/1     Running   0          40s   10.244.0.200   aks-nodepool1-xxxx
-gpu-operator  gpu-operator-xxxx-node-feature-discovery-worker-xxxxxxxxx     1/1     Running   0          40s   10.244.0.190   aks-nodepool1-xxxx
-gpu-operator  gpu-operator-xxxxxxxxxxxxxx                                   1/1     Running   0          40s   10.244.0.128   aks-nodepool1-xxxx
-```
+**Important:** It is recommended to **skip the GPU driver installation** during node pool creation, as the NVIDIA GPU Operator will handle this in the next step.
 
-For additional guidance on setting up GPU node pools in AKS, refer to the [Microsoft Docs](https://learn.microsoft.com/en-us/azure/aks/gpu-cluster?tabs=add-ubuntu-gpu-node-pool).
+## Step 2: Install NVIDIA GPU Operator
 
-### Task 3. Configure Dynamo
+Once your AKS cluster is configured with a GPU-enabled node pool, install the NVIDIA GPU Operator. This operator automates the deployment and lifecycle of all NVIDIA software components required to provision GPUs in the Kubernetes cluster, including drivers, container toolkit, device plugin, and monitoring tools.
 
-1. Pull Dynamo Repo
-The Dynamo GitHub repository will be leveraged extensively throughout this walkthrough. Pull the repository using:
-```bash
-# clone Dynamo GitHub repo
-git clone https://github.com/ai-dynamo/dynamo.git
+Follow the [Installing the NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html) guide to install the GPU Operator on your AKS cluster.
 
-# go to root of Dynamo repo, latest commit at the time of writing this document was 22e6c96f715177c776421c90e9415a7dbc4f661a
-cd dynamo
-```
+You should see output similar to the example below. Note that this is not the complete output; there should be additional pods running. The most important thing is to verify that the GPU Operator pods are in a `Running` state.
 
-2. Install Dynamo from Published Artifacts on NGC (see the [Dynamo Cloud guide](../../../docs/kubernetes/installation_guide.md)):
 ```bash
-export NAMESPACE=dynamo-cloud
-export RELEASE_VERSION=0.3.2
-
-#The above linked document says to authenticate using NGC_API_KEY, not neccessary, since this is an openly available container
-
-# Fetch the CRDs helm chart
-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
-
-# Fetch the platform helm chart
-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
-
-# Step 1: Install Custom Resource Definitions (CRDs)
-helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz \
-  --namespace default \
-  --wait \
-  --atomic
-
-#Step 2: Install Dynamo Platform
-kubectl create namespace ${NAMESPACE}
-helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE}
-
-# Check pod status:
-kubectl get pods -n $NAMESPACE
-
-# output should be similar
-NAME                                                              READY   STATUS    RESTARTS   AGE
-dynamo-platform-dynamo-operator-controller-manager-549b5d5xf7rv   2/2     Running   0          2m50s
-dynamo-platform-etcd-0                                            1/1     Running   0          2m50s
-dynamo-platform-nats-0                                            2/2     Running   0          2m50s
-dynamo-platform-nats-box-5dbf45c748-kln82                         1/1     Running   0          2m51s
+NAMESPACE     NAME                                                          READY   STATUS    RESTARTS   AGE
+gpu-operator  gpu-feature-discovery-xxxxx                                   1/1     Running   0          2m
+gpu-operator  gpu-operator-xxxxx                                            1/1     Running   0          2m
+gpu-operator  nvidia-container-toolkit-daemonset-xxxxx                      1/1     Running   0          2m
+gpu-operator  nvidia-cuda-validator-xxxxx                                   0/1     Completed 0          1m
+gpu-operator  nvidia-device-plugin-daemonset-xxxxx                          1/1     Running   0          2m
+gpu-operator  nvidia-driver-daemonset-xxxxx                                 1/1     Running   0          2m
 ```
 
-There are other ways to install Dynamo, you can find them [here](../../../docs/kubernetes/installation_guide.md).
-
-### Task 4. Deploy a model
+## Step 3: Deploy Dynamo Kubernetes Operator
 
-We're going to be deploying MSFTs Phi-3.5-vision-instruct. You can alter this flow to deploy whatever model you need.
+Follow the [Deploying Inference Graphs to Kubernetes](../../../docs/kubernetes/README.md) guide to install Dynamo on your AKS cluster.
 
-Refer: [dynamo/docs/examples/README.md at main · ai-dynamo/dynamo](https://github.com/ai-dynamo/dynamo/blob/main/docs/examples/README.md)
+Validate that the Dynamo pods are running:
 
 ```bash
-# Set your dynamo root directory
-cd <root-dynamo-folder>
-export PROJECT_ROOT=$(pwd)
-
-# Create a Kubernetes secret containing your sensitive values:
-export HF_TOKEN=your_hf_token
-kubectl create secret generic hf-token-secret --from-literal=HF_TOKEN=${HF_TOKEN} -n ${NAMESPACE}
-
-# Deploying an example (Time taken depends on model, phi3v took ~5mins)
-# You can edit the number os replicas of encoder/ decoder independently here to suit your deployment needs
-
-kubectl apply -f examples/multimodal/deploy/k8s/agg-phi3v.yaml -n ${NAMESPACE}
-
-# Get status of deployment
-kubectl get dynamoGraphDeployment -n ${NAMESPACE}
+kubectl get pods -n dynamo-system
 
-# You can use any of the following commands to see logs for debugging
-kubectl get pods -n  ${NAMESPACE} -o wide
-kubectl logs <pod-name> -n  ${NAMESPACE}
-kubectl exec -it <pod-name> -n  ${NAMESPACE} -- nvidia-smi
-
-# Enable Port forwarding to be able to hit a curl request
-kubectl get svc -n ${NAMESPACE}
-
-#Look for one that ends in -frontend and use it for port forward.
-SERVICE_NAME=$(kubectl get svc -n ${NAMESPACE} -o name | grep frontend | sed 's|.*/||' | sed 's|-frontend||' | head -n1)
-kubectl port-forward svc/${SERVICE_NAME}-frontend 8000:8000 -n ${NAMESPACE} &
+# Expected output:
+# NAME                                                              READY   STATUS    RESTARTS   AGE
+# dynamo-platform-dynamo-operator-controller-manager-xxxxxxxxxx     2/2     Running   0          2m50s
+# dynamo-platform-etcd-0                                            1/1     Running   0          2m50s
+# dynamo-platform-nats-0                                            2/2     Running   0          2m50s
+# dynamo-platform-nats-box-xxxxxxxxxx                               1/1     Running   0          2m51s
 ```
 
-#### Task 5. Testing
+## Step 4: Deploy and Test a Model
 
-```
-curl localhost:8000/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "microsoft/Phi-3.5-vision-instruct",
-    "messages": [
-      {
-        "role": "user",
-        "content": [
-          { "type": "text", "text": "What is in this image?" },
-          { "type": "image_url", "image_url": { "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" } }
-        ]
-      }
-    ],
-    "stream": false
-  }'
-
-#Output should be something like:
-{"id": "a200785a-a4dd-4208-8ced-2d0ea30351a4", "object": "chat.completion", "created": 1753223375, "model": "microsoft/Phi-3.5-vision-instruct", "choices": [{"index": 0, "message": {"role": "assistant", "content": " The image features a wooden boardwalk extending into a grassy area surrounded by a wetland. There are water lilies in the water, and the sky is clear with a few clouds. The sun is shining, casting light on the scene, and there are trees visible in the background."}, "finish_reason": "stop"}]}
-```
+Follow the [Deploy Model/Workflow](../../../docs/kubernetes/installation_guide.md#next-steps) guide to deploy and test a model on your AKS cluster.
 
 ## Clean Up Resources
 
-In order to clean up any Dynamo related resources, from the container shell you launched the deployment from, simply run the following command:
+If you want to clean up the Dynamo resources created during this guide, you can run the following commands:
 
 ```bash
-# Delete deployment
-kubectl delete dynamoGraphDeployment <your-dep-name> -n ${NAMESPACE}
+# Delete all Dynamo Graph Deployments
+kubectl delete dynamographdeployments.nvidia.com --all --all-namespaces
 
-# Delete the AKS Cluster
-az aks delete --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP --yes
+# Uninstall Dynamo Platform and CRDs
+helm uninstall dynamo-platform -n dynamo-kubernetes
+helm uninstall dynamo-crds -n default
 ```
 
-This will spin down the Dynamo deployment we configured and spin down all the resources that were leveraged for the deployment.
+This will spin down the Dynamo deployment and all associated resources.
+
+If you want to delete the GPU Operator, follow the instructions in the [Uninstalling the NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/uninstall.html) guide.
+
+If you want to delete the entire AKS cluster, follow the instructions in the [Delete an AKS cluster](https://learn.microsoft.com/en-us/azure/aks/delete-cluster) guide.