-
Notifications
You must be signed in to change notification settings - Fork 0
Add modelcar #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add modelcar #126
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
210 changes: 210 additions & 0 deletions
210
docs/en/model_inference/inference_service/how_to/using_modelcar.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,210 @@ | ||
| --- | ||
| weight: 10 | ||
| --- | ||
|
|
||
| # Using KServe Modelcar for Model Storage | ||
|
|
||
| ## Overview | ||
|
|
||
| KServe Modelcar, also known as OCI container-based model storage, is a powerful approach for deploying models in cloud-native environments. By packaging models as OCI container images, you can leverage container runtime capabilities to achieve faster startup times and more efficient resource utilization. | ||
|
|
||
| ## Benefits of Using OCI Containers for Model Storage | ||
|
|
||
| - **Reduced startup times**: Avoid downloading the same model multiple times | ||
| - **Lower disk space usage**: Reduce the number of models downloaded locally | ||
| - **Improved model performance**: Allow pre-fetched images for faster loading | ||
| - **Offline environment support**: Ideal for environments with limited internet access | ||
| - **Simplified model distribution**: Use enterprise internal registries like Quay or Harbor | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - Alauda AI platform installed and running | ||
| - Model files ready for packaging | ||
| - Write/Push access to a container registry (e.g., Harbor, Quay) | ||
| - Podman or nerdctl installed on your local machine | ||
|
|
||
| ## Packaging Model as OCI Image | ||
|
|
||
| ### Option 1: Using Busybox Base Image (Alauda AI Recommendation) | ||
|
|
||
| Create a Containerfile with the following content: | ||
|
|
||
| ```bash | ||
| # Use lightweight busybox as base image | ||
| FROM busybox | ||
|
|
||
| # Create directory for model and set permissions | ||
| RUN mkdir -p /models && chmod 775 /models | ||
|
|
||
| # Note: Harbor has a file size limit per image layer. | ||
| # It is recommended to copy large files (e.g. 4GB .safetensors) in chunks of 2 files per layer. | ||
| # The last layer should copy the remaining model configuration files. | ||
| # Example: | ||
| # COPY models/model-00001-of-00004.safetensors models/model-00002-of-00004.safetensors /models/ | ||
| # COPY models/model-00003-of-00004.safetensors models/model-00004-of-00004.safetensors /models/ | ||
| # COPY models/*.json models/*.md models/*.txt /models/ | ||
|
|
||
| # If size is not an issue, you can simply copy everything: | ||
| COPY models/ /models/ | ||
|
|
||
| # According to KServe convention, model loader usually only needs image layers | ||
| # No need to keep running, but can add CMD if debugging is needed | ||
| ``` | ||
|
|
||
| ### Option 2: Using UBI Micro Base Image (Red Hat Recommendation) | ||
|
|
||
| Create a Containerfile with the following content: | ||
|
|
||
| ```bash | ||
| FROM registry.access.redhat.com/ubi9/ubi-micro:latest | ||
| COPY --chown=0:0 models /models | ||
| RUN chmod -R a=rX /models | ||
|
|
||
| # nobody user | ||
| USER 65534 | ||
| ``` | ||
|
|
||
| ### Building and Pushing the Model Image | ||
|
|
||
| 1. Create a temporary directory for storing the model and support files: | ||
| ```bash | ||
| cd $(mktemp -d) | ||
| ``` | ||
|
|
||
| 2. Create a models folder (and optionally a version subdirectory for frameworks like OpenVINO): | ||
| ```bash | ||
| mkdir -p models/1 | ||
| ``` | ||
|
|
||
| 3. Copy your model files to the appropriate directory: | ||
| - For most frameworks: `cp -r your-model-folder/* models/` | ||
| - For OpenVINO: `cp -r your-model-folder/* models/1/` | ||
|
|
||
| 4. Build the OCI container image: | ||
| ```bash | ||
| # Using Podman | ||
| podman build --format=oci -t <registry>/<repository>:<tag> . | ||
|
|
||
| # Using nerdctl | ||
| nerdctl build -t <registry>/<repository>:<tag> . | ||
| ``` | ||
|
|
||
| 5. Push the image to your container registry: | ||
| ```bash | ||
| # Using Podman | ||
| podman push <registry>/<repository>:<tag> | ||
|
|
||
| # Using nerdctl | ||
| nerdctl push <registry>/<repository>:<tag> | ||
| ``` | ||
|
|
||
| > **Note** | ||
| > If your repository is private, ensure that you are authenticated to the registry before uploading your container image. | ||
|
|
||
| ## Deploying Model from OCI Image | ||
|
|
||
| ### Prerequisites for Deployment | ||
|
|
||
| No additional prerequisites are required beyond the general prerequisites listed above. | ||
|
|
||
| ### Creating the InferenceService | ||
|
|
||
| Create an InferenceService YAML file with the following content: | ||
|
|
||
| ```yaml | ||
| kind: InferenceService | ||
| apiVersion: serving.kserve.io/v1beta1 | ||
| metadata: | ||
| annotations: | ||
| aml-model-repo: Qwen2.5-0.5B-Instruct # [!code callout] | ||
| aml-pipeline-tag: text-generation | ||
| serving.kserve.io/deploymentMode: Standard | ||
| labels: | ||
| aml-pipeline-tag: text-generation | ||
| aml.cpaas.io/runtime-type: vllm # [!code callout] | ||
| name: oci-demo | ||
| namespace: demo-space # [!code callout] | ||
| spec: | ||
| predictor: | ||
| maxReplicas: 1 | ||
| minReplicas: 1 | ||
| model: | ||
| modelFormat: | ||
| name: transformers | ||
| protocolVersion: v2 | ||
| resources: | ||
| limits: | ||
| cpu: '2' | ||
| ephemeral-storage: 10Gi | ||
| memory: 8Gi | ||
| requests: | ||
| cpu: '2' | ||
| memory: 4Gi | ||
| runtime: aml-vllm-0.11.2-cpu # [!code callout] | ||
| storageUri: oci://<registry>/<repository>:<tag> # [!code callout] | ||
| securityContext: | ||
| seccompProfile: | ||
| type: RuntimeDefault | ||
| ``` | ||
|
|
||
| <Callouts> | ||
| 1. Replace `Qwen2.5-0.5B-Instruct` with your actual model name. | ||
| 2. `aml.cpaas.io/runtime-type: vllm` specifies the code runtime type. For more information about custom inference runtimes, see [Extend Inference Runtimes](./custom_inference_runtime.mdx). | ||
| 3. Replace `demo-space` with an existing namespace, or create it first with `kubectl create namespace demo-space`. | ||
| 4. Replace `aml-vllm-0.11.2-cpu` with the runtime name that is already installed in your platform (corresponding to a ClusterServingRuntime CRD instance). | ||
| 5. `storageUri` specifies the OCI image URI with tag where the model is stored. Use a fully qualified registry URL. For example: `storageUri: oci://docker.io/alaudapublic/models-qwen2.5:v1` | ||
|
|
||
| </Callouts> | ||
|
|
||
| ### Applying the InferenceService | ||
|
fyuan1316 marked this conversation as resolved.
|
||
|
|
||
| Use kubectl to apply the InferenceService configuration: | ||
|
|
||
| ```bash | ||
| kubectl apply -f oci-inference-service.yaml | ||
| ``` | ||
|
|
||
| ### Verifying the Deployment | ||
|
|
||
| Check the status of the InferenceService: | ||
|
|
||
| ```bash | ||
| kubectl get inferenceservices -n demo-space | ||
| ``` | ||
|
|
||
| You should see the service in `Ready` state once the deployment is successful. | ||
|
fyuan1316 marked this conversation as resolved.
|
||
|
|
||
| If the service fails to start or remains in an unready state, you can troubleshoot by: | ||
|
|
||
| 1. Checking the InferenceService events for descriptive errors: | ||
| ```bash | ||
| kubectl describe inferenceservice oci-demo -n demo-space | ||
| ``` | ||
| 2. Checking the predictor pod logs: | ||
| ```bash | ||
| kubectl logs -n demo-space -l serving.kserve.io/inferenceservice=oci-demo | ||
| ``` | ||
| 3. Verifying the model image can be successfully pulled locally on the cluster: | ||
| ```bash | ||
| # On a node in the cluster | ||
| crictl pull <registry>/<repository>:<tag> | ||
| ``` | ||
|
|
||
| ## Best Practices | ||
|
fyuan1316 marked this conversation as resolved.
coderabbitai[bot] marked this conversation as resolved.
|
||
|
|
||
| 1. **Model Versioning**: Use tags in your container images to version your models | ||
| 2. **Image Size Optimization**: Use lightweight base images and only include necessary model files | ||
| 3. **Registry Management**: Use private registries with proper access control | ||
| 4. **Security**: Follow container security best practices, including regular vulnerability scans | ||
| 5. **Caching**: Leverage container registry caching to improve pull times | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### Common Issues | ||
|
|
||
| 1. **Permission Errors**: Ensure the model files in the image have proper permissions | ||
| 2. **Registry Authentication**: Verify that the cluster has access to the container registry | ||
|
|
||
| ## Conclusion | ||
|
|
||
| Using KServe Modelcar (OCI container-based model storage) provides an efficient way to deploy models in Alauda AI platform. By following the steps outlined in this guide, you can package your models as OCI images and deploy them with faster startup times and improved resource utilization. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.