-
Notifications
You must be signed in to change notification settings - Fork 0
Update for model registry operator #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
typhoonzero
wants to merge
2
commits into
master
Choose a base branch
from
update_model_registry_install_with_operator
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| --- | ||
| weight: 40 | ||
| --- | ||
|
|
||
| # Use Kubeflow Model Registry | ||
|
|
||
| The Kubeflow Model Registry is a central repository for managing machine learning models, their versions, and associated metadata. It allows data scientists to publish models, track their lineage, and collaborate on model development. | ||
|
|
||
| ## Access the Model Registry | ||
|
|
||
| 1. **Open Dashboard**: Log in to the Kubeflow central dashboard. | ||
| 2. **Model Registry**: Click on **Model Registry** in the sidebar. This will take you to the list of registered models in your namespace. | ||
| > **Note**: If you do not see the Model Registry, ensure a Model Registry instance has been deployed in your namespace by the platform administrator. | ||
|
|
||
| ## Register a Model | ||
|
|
||
| You can register models either through the user interface or programmatically using the Python client. | ||
|
|
||
| ### Option 1: Using the UI | ||
|
|
||
| 1. **Create Registered Model**: | ||
| - Click **Register Model**. | ||
| - **Model Name**: Enter a unique name (e.g., `fraud-detection`). | ||
| - **Description**: Add a description. | ||
| - **Version details**: Optionally add version information, tags, and metadata. | ||
| - **Model Location**: Provide the S3/URI to the model artifact (e.g., `s3://my-bucket/models/fraud-detection/v1/`). | ||
| - Click **Create**. | ||
|
|
||
| 2. **Create Version**: | ||
| - Click on drop down menu next to the **Registered Model** and select **Register New Version**. | ||
| - Enter version name, description, metadata and artifact URI. | ||
| - Click **Register new version**. | ||
|
|
||
| ### Option 2: Using Python Client | ||
|
|
||
| You can register models directly from your Jupyter Notebook using the `model-registry` Python client. | ||
|
|
||
| **Prerequisites**: | ||
| - Install the client: `python -m pip install model-registry=="0.3.5" kserve=="0.13"` | ||
| - Ensure you have access to the Model Registry service. If running inside a Kubeflow Notebook, you can use the internal service DNS (e.g. `http://model-registry-service.<namespace>.svc:8080`). | ||
|
|
||
| **Sample Code**: | ||
|
|
||
| The following example demonstrates how to register a model stored in S3. | ||
|
|
||
| ```python | ||
| from model_registry import ModelRegistry | ||
|
|
||
| # 1. Connect to the Model Registry | ||
| # Replace with your actual Model Registry service host/port | ||
| # Inside cluster, typically: "http://model-registry-service.<namespace>.svc.cluster.local:8080" | ||
| registry = ModelRegistry( | ||
| server_address="http://model-registry-service.kubeflow.svc.cluster.local", | ||
| port=8080, | ||
| author="your name", | ||
| is_secure=False | ||
| ) | ||
|
|
||
| # 2. Register a new Model | ||
| rm = registry.register_model( | ||
| "iris", | ||
| "s3://kfserving-examples/models/sklearn/1.0/model", | ||
| model_format_name="sklearn", | ||
| model_format_version="1", | ||
| version="v1", | ||
| description="Iris scikit-learn model", | ||
| metadata={ | ||
| "accuracy": 3.14, | ||
| "license": "BSD 3-Clause License", | ||
| } | ||
| ) | ||
|
|
||
| # 3. Retrieve model information | ||
| model = registry.get_registered_model("iris") | ||
| print("Registered Model:", model, "with ID", model.id) | ||
|
|
||
| version = registry.get_model_version("iris", "v1") | ||
| print("Model Version:", version, "with ID", version.id) | ||
|
|
||
| art = registry.get_model_artifact("iris", "v1") | ||
| print("Model Artifact:", art, "with ID", art.id) | ||
|
|
||
| ``` | ||
|
|
||
| ## Deploy a Registered Model | ||
|
|
||
| Once a model is registered, you can deploy it as an **InferenceService** using KServe. | ||
|
|
||
| To deploy, you typically need the URI of the model artifact. You can retrieve this from the Registry UI or via the Python API: | ||
|
|
||
| ```python | ||
| from kubernetes import client | ||
| import kserve | ||
|
|
||
| isvc = kserve.V1beta1InferenceService( | ||
| api_version=kserve.constants.KSERVE_GROUP + "/v1beta1", | ||
| kind=kserve.constants.KSERVE_KIND, | ||
| metadata=client.V1ObjectMeta( | ||
| name="iris-model", | ||
| namespace=kserve.utils.get_default_target_namespace(), | ||
| labels={ | ||
| "modelregistry/registered-model-id": model.id, | ||
| "modelregistry/model-version-id": version.id, | ||
| }, | ||
| ), | ||
| spec=kserve.V1beta1InferenceServiceSpec( | ||
| predictor=kserve.V1beta1PredictorSpec( | ||
| model=kserve.V1beta1ModelSpec( | ||
| storage_uri=art.uri, | ||
| model_format=kserve.V1beta1ModelFormat( | ||
| name=art.model_format_name, version=art.model_format_version | ||
| ), | ||
| ) | ||
| ) | ||
| ), | ||
| ) | ||
| ks_client = kserve.KServeClient() | ||
| ks_client.create(isvc) | ||
| ``` | ||
|
|
||
| Once deployed, the KServe controller will pull the model from the specified S3 URI and start the inference server. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,159 @@ | ||
| --- | ||
| weight: 10 | ||
| --- | ||
|
|
||
| # Use Kubeflow Notebooks | ||
|
|
||
| Kubeflow Notebooks provide a Kubernetes-native Jupyter environment for data scientists to develop, train, and deploy machine learning models. Each notebook server runs as a separate Pod in your namespace, ensuring isolation and dedicated resources. | ||
|
|
||
| > **NOTE**: We recommend using Alauda AI Workbench for a more integrated experience with additional features like resource types, configurations, and better integration with other components. However, you can also use the native Kubeflow Notebooks if you prefer a more lightweight setup or need specific features from the upstream project. | ||
|
|
||
| ## Concepts | ||
|
|
||
| - **Notebook Server**: A JupyterLab instance running in a container. | ||
| - **Custom Image**: You can use standard pre-built images (e.g., containing TensorFlow, PyTorch) or provide your own custom Docker image with specific libraries. | ||
| - **Persistent Storage**: By default, notebook servers are attached to Persistent Volume Claims (PVCs) to store your workspace directory (usually `/home/jovyan`). This ensures your notebooks and data are saved even if the server is restarted or updated. | ||
|
|
||
| ## Create a Notebook Server | ||
|
|
||
| 1. **Access the Dashboard**: | ||
| Navigate to the **Notebooks** section in the Kubeflow dashboard. | ||
|
|
||
| 2. **New Notebook**: | ||
| Click **New Notebook**. Make sure to select the correct namespace on top of the dashboard where you want to create the notebook server. | ||
|
|
||
| 3. **Configure the Server**: | ||
| - **Name**: Enter a unique name for your notebook server. | ||
| - **Image**: | ||
| - **Select Type**: Choose the type of image including JupyterLab, Visual Studio Code, or RStudio. | ||
| - **Select Image**: Choose from a list of pre-built images or specify a custom image by providing the Docker image URL. | ||
| - **CPU / RAM**: Allocate CPU and Memory resources based on your workload. Start small (e.g., 1 CPU, 2GB RAM) and increase if needed. | ||
| - **GPUs**: Request GPUs (e.g., NVIDIA) if you plan to run deep learning training or inference tasks that require acceleration. | ||
| - **Workspace Volume**: This volume mounts to your home directory (`/home/jovyan`). Create a new volume (default) or attach an existing one to access previous work. | ||
| - **Data Volumes**: (Optional) Attach additional existing PVCs to access large datasets without copying them to your workspace. | ||
| - **Configurations**: (Optional) Select PodDefaults (if available) to inject generic configurations like S3 credentials, Git config, or environment variables. | ||
|
|
||
| 4. **Launch**: | ||
| Click **Launch**. The server will be provisioned. Wait for the status display to turn **Running** (green). | ||
|
|
||
| ## Connect to the Notebook | ||
|
|
||
| Once the server status is **Running**: | ||
| 1. Click **Connect**. | ||
| 2. This opens the **JupyterLab/VS Code/RStudio** interface in a new browser tab. | ||
| 3. You can now create Python 3 notebooks, open a terminal, or manage files. | ||
|
|
||
| ## Environment Management | ||
|
|
||
| ### Installing Python Packages | ||
|
|
||
| While you can install packages in your home directory to persist them, it is best practice to use a custom image for reproducibility. | ||
|
|
||
| Create an "venv" directory in your home and install packages there: | ||
|
|
||
| ```bash | ||
| python -m venv ~/venv | ||
| source ~/venv/bin/activate | ||
| python -m pip install transformers datasets | ||
| ``` | ||
|
|
||
| When you start a new terminal session, remember to activate the virtual environment to access the installed packages. | ||
|
|
||
| To use the virtual environment in Jupyter notebooks, you can install `ipykernel` and create a new kernel: | ||
|
|
||
| ```bash | ||
| source ~/venv/bin/activate | ||
| python -m pip install ipykernel | ||
| python -m ipykernel install --user --name=venv --display-name "Python (venv)" | ||
| ``` | ||
|
|
||
| Then, in your Jupyter notebook, you can select the "Python (venv)" kernel to use the packages installed in your virtual environment. | ||
|
|
||
| Virtual environments are persisted in your home directory, so they will remain available even if you stop and restart the notebook server. However, if you need to share the environment across multiple notebook servers or want better reproducibility, consider building a custom Docker image with the required packages pre-installed. | ||
|
|
||
| ### Using Custom Images | ||
|
|
||
| For production environments or complex dependencies (e.g., system libraries), build a Docker image containing all required libraries and use it as your **Custom Image** when creating the notebook. This ensures exact reproducibility. | ||
|
|
||
| ## Manage Configurations (PodDefaults) | ||
|
|
||
| Kubeflow uses `PodDefault` resources (often labeled as **Configurations** in the UI) to inject common configurations—such as environment variables, volumes, and volume mounts—into Notebooks. This is the standard way to securely provide credentials for Object Storage (S3, MinIO) without hardcoding them in your notebooks. | ||
|
|
||
| ### Create a PodDefault | ||
|
|
||
| You can create a PodDefault by applying a YAML manifest. | ||
|
|
||
| Define a `PodDefault` that selects pods with a specific label. | ||
|
|
||
|
|
||
| ```yaml | ||
| apiVersion: kubeflow.org/v1alpha1 | ||
| kind: PodDefault | ||
| metadata: | ||
| name: add-gcp-secret | ||
| namespace: MY_PROFILE_NAMESPACE | ||
| spec: | ||
| selector: | ||
| matchLabels: | ||
| add-gcp-secret: "true" | ||
| desc: "add gcp credential" | ||
| volumeMounts: | ||
| - name: secret-volume | ||
| mountPath: /secret/gcp | ||
| volumes: | ||
| - name: secret-volume | ||
| secret: | ||
| secretName: gcp-secret | ||
| ``` | ||
|
|
||
| ### Apply Configuration | ||
|
|
||
| When creating a new Notebook Server: | ||
| 1. Scroll to the **Configurations** section. | ||
| 2. You will see a list of available PodDefaults (e.g., `s3-access`). | ||
| 3. Check the box to apply it. | ||
|
|
||
| This will automatically inject the specified environment variables or volumes into your Notebook container. | ||
|
|
||
| ## Accessing Data | ||
|
|
||
| ### Using Mounted Volumes | ||
| If you attached a data volume (PVC) during creation, it will be available at the specified mount point. | ||
|
|
||
| ```python | ||
| import pandas as pd | ||
|
|
||
| # Assuming you mounted a data volume at /home/jovyan/data | ||
| df = pd.read_csv('/home/jovyan/data/dataset.csv') | ||
| print(df.head()) | ||
| ``` | ||
|
|
||
| ### Using Object Storage (S3 / MinIO) | ||
| To access data in S3-compatible storage, use libraries like `boto3` or `s3fs`. If your administrator has configured PodDefaults for credentials, environment variables (like `AWS_ACCESS_KEY_ID`) will be pre-populated. | ||
|
|
||
| ```python | ||
| import os | ||
| import s3fs | ||
| import pandas as pd | ||
|
|
||
| # Check if credentials are injected | ||
| print(os.getenv("AWS_S3_ENDPOINT")) | ||
|
|
||
| # Read directly from S3 | ||
| fs = s3fs.S3FileSystem( | ||
| client_kwargs={'endpoint_url': os.getenv('AWS_S3_ENDPOINT')}, | ||
| key=os.getenv('AWS_ACCESS_KEY_ID'), | ||
| secret=os.getenv('AWS_SECRET_ACCESS_KEY') | ||
| ) | ||
|
|
||
| with fs.open('s3://my-bucket/data/train.csv') as f: | ||
| df = pd.read_csv(f) | ||
| ``` | ||
|
|
||
| ## Best Practices | ||
|
|
||
| - **Stop Unused Servers**: Notebook servers consume cluster resources (especially GPUs) even when idle. Stop them when you are not actively working. | ||
| - **Git Integration**: Use the Git extension in JupyterLab (or the terminal) to version control your notebooks. Avoid storing large datasets in Git. | ||
| - **Resource Monitoring**: Monitor your resource usage. If your kernel crashes frequently (OOM), you may need to stop the server and restart it with more Memory limits. | ||
| - **Clean Up**: Periodically delete old notebook servers and their associated PVCs if the data is no longer needed. | ||
|
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and view the context around line 217 git ls-files docs/en/installation/kubeflow.mdxRepository: alauda/aml-docs
Length of output: 92
🏁 Script executed:
Repository: alauda/aml-docs
Length of output: 1554
🌐 Web query:
Kubeflow model-registry-operator AuthorizationPolicy naming convention💡 Result:
In Kubeflow (including when accessing Model Registry through Istio), the Profile controller / KFAM uses these Istio
AuthorizationPolicynaming conventions:ns-owner-access-istio(kubeflow.org)user-<SAFE_USER_EMAIL>-clusterrole-<USER_ROLE>(kubeflow.org)<SAFE_USER_EMAIL>= user email lowercased, with special characters replaced by-(kubeflow.org)<USER_ROLE>=editorview(kubeflow.org)Citations:
🌐 Web query:
Kubeflow model-registry ModelRegistry CRD AuthorizationPolicy resource relationship💡 Result:
They’re different layers and don’t “point to” each other.
ModelRegistryis a Kubernetes Custom Resource (CR) (backed by aModelRegistryCRD in operator-style installs) that a controller reconciles into the runtime resources (Deployment/Service/etc.) for a Model Registry instance. (github.com)AuthorizationPolicyis an Istio security resource that controls which authenticated/identified callers can reach workloads (typically at the namespace/workload level). In Kubeflow, Profiles create per-namespace IstioAuthorizationPolicyobjects for the owner and contributors. (kubeflow.org)Where the “relationship” shows up in practice (Kubeflow multi-user):
kubeflownamespace) so it fits Kubeflow’s multi-user isolation model. (kubeflow.org)AuthorizationPolicyobjects (e.g.,ns-owner-access-istio, and per-contributor policies). These policies commonly apply namespace-wide (no workload selector), so they effectively govern access to all services in that namespace, including the Model Registry API/UI you deployed there. (kubeflow.org)So: the
ModelRegistryCR/CRD creates the Model Registry workloads; the Profile-created IstioAuthorizationPolicyrestricts network access to those workloads—without any explicit ownerReference or direct linkage between the two resource kinds. (github.com)Citations:
Clarify the AuthorizationPolicy identification step to prevent command failures.
The documentation assumes
<model-registry-name>is the AuthorizationPolicy resource name, but Kubeflow's Profile controller generates AuthorizationPolicy with different naming conventions (e.g.,ns-owner-access-istio). Users following this instruction verbatim will encounter a "resource not found" error. Add a step to identify the correct AuthorizationPolicy before editing:Then use the actual resource name in the edit command.
🤖 Prompt for AI Agents