alauda · typhoonzero · Mar 5, 2026 · Mar 6, 2026 · coderabbitai · Mar 5, 2026
diff --git a/docs/en/installation/kubeflow.mdx b/docs/en/installation/kubeflow.mdx
@@ -3,7 +3,7 @@
 Deploy Kubeflow plugins in Alauda AI >= 2.0. Including:
 
 - kfbase: Kubeflow Base components, including authentication and authorization, central dashboard, notebook, pvc-viewer, tensorboards, volumes, model registry ui, kserve endpoints ui, model catalog API service, etc.
-- chart-kubeflow-model-registry: Kubeflow Model Registry instance (Helm Chart)
+- model-registry-operator: Kubeflow Model Registry Operator
 - kfp: Kubeflow Pipeline
 - kftraining: Kubeflow Training Operator (deprecated)
 - kubeflow-trainer: Kubeflow Training job management plugin, aka. Kubeflow Trainer v2 (replaces kftraining)
@@ -79,7 +79,7 @@ violet push --platform-address="https://192.168.171.123" \
 ```
 
 - kfbase: Kubeflow Base functionality
-- chart-kubeflow-model-registry: Kubeflow Model Registry
+- model-registry-operator: Kubeflow Model Registry Operator
 - kfp: Kubeflow Pipeline functionality
 - kftraining: Kubeflow Training Operator (deprecated)
 - kubeflow-trainer: Kubeflow Training job management plugin (replaces kftraining)
@@ -195,22 +195,26 @@ As above, in **Cluster Plugins**, find kfp (Kubeflow Pipeline) and kftrainer (Ku
 **Note: After Kubeflow Pipeline deployment, Pipeline related functions can be used in the Kubeflow interface.**
 **Note: Kubeflow Training Operator is a background task scheduler and will not appear in the UI menu and functions.**
 
-### 5. Deploy chart-kubeflow-model-registry (Kubeflow Model Registry)
+### 5. Deploy Kubeflow Model Registry
 
-In **Catalog** or **Administrator** - **Marketplace** - **Chart Repositories**, find chart-kubeflow-model-registry, click the "Create" button, fill in the deployment name, project, namespace (example deployment location), Chart Version, then copy the `values.yaml` configuration information from the right to the left, modify the following content according to the cluster information:
+In **Administrator** - **MarketPlace** - **OperatorHub**, find Model Registry Operator, click the "Install" button to complete the deployment of the operator.
 
-> **Note: Must install in a namespace that has already been bound to a Kubeflow user Profile, otherwise the Model Registry UI will not be displayed**
+After the operator is installed, you need to create a `ModelRegistry` instance in the user's namespace, switch to **All Instances** tab, click "Create" button to create the instance.
 
-- global.registry.address: The image registry address used by the current platform
-- mysqlStorageClass: The mysql storage class used by Model Registry. Needs to be a storage class supported by the target deployment cluster.
-- mysqlStorageSize: The mysql storage size used by Model Registry.
-- mysqlDataBase: Database name (will be created automatically).
-- modelRegistryDisplayName: The name of the Model Registry instance to be deployed
-- modelRegistryDescription: Brief description of the Model Registry instance to be deployed
+> **Note: Must create the instance in a namespace that has already been bound to a Kubeflow user Profile, otherwise the Model Registry UI will not be displayed**
+
+When creating the instance, configure the following parameters as needed:
+
+- **Name**: Configure the name of the Model Registry instance.
+- **Namespace**: Configure the namespace where the Model Registry instance is located, it must be a namespace that has been bound to a Kubeflow user Profile.
+- **MySQL Storage Class**: Configure the MySQL storage class, which is used to store the metadata of the Model Registry. Fill in according to the available storage classes in your cluster, for example, `standard`.
+- **MySQL Storage Size**: Configure the MySQL storage size, which is used to store the metadata of the Model Registry. The default value is `10Gi`, you can adjust it according to your needs.
+- **DisplayName**: The name of the Model Registry instance to be displayed.
+- **Description**: Brief description of the Model Registry instance.
 
 **Note: After the Model Registry instance starts, refresh the Model Registry menu in the left navigation of the Kubeflow page to see the instance deployed in the above steps. Before deploying the first instance, the Kubeflow Model Registry interface will display empty.**
 
-**Note: The Model Registry instance will restrict network requests from non-current namespaces. If you need to allow more namespaces to access, you need to manually modify `kubectl -n <your-namespace> edit authorizationpolicy model-registry-service` and according to the istio documentation, add the namespaces that are allowed to access.**
+**Note: The Model Registry instance will restrict network requests from non-current namespaces. If you need to allow more namespaces to access, you need to manually modify `kubectl -n <your-namespace> edit authorizationpolicy <model-registry-name>` and according to the istio documentation, add the namespaces that are allowed to access.**
 
 **Note: You can install multiple Model Registry instances in different namespaces, each instance is independent of each other.**
 
@@ -223,6 +227,8 @@ before deploying kubeflow-trainer, if you have already deployed kftraining.
 > Note: make sure to install LWS (Alauda Build of LeaderWorkerSet) plugin before deploying
 kubeflow-trainer, as LWS is a dependency of kubeflow-trainer.
 
+> Note: Kubeflow Trainer v2 requires minimum Kubernetes version `1.32.3`, older Kubernetes versions may cause unexpected issues.
+
 In **Cluster Plugins**, find kubeflow-trainer (Kubeflow Trainer v2), 
 click the "Install" button, select the options of whether to enable `JobSet`
 and click the "Install" button to complete the deployment.

diff --git a/docs/en/kubeflow/how_to/model-registry.mdx b/docs/en/kubeflow/how_to/model-registry.mdx
@@ -0,0 +1,121 @@
+---
+weight: 40
+---
+
+# Use Kubeflow Model Registry
+
+The Kubeflow Model Registry is a central repository for managing machine learning models, their versions, and associated metadata. It allows data scientists to publish models, track their lineage, and collaborate on model development.
+
+## Access the Model Registry
+
+1. **Open Dashboard**: Log in to the Kubeflow central dashboard.
+2. **Model Registry**: Click on **Model Registry** in the sidebar. This will take you to the list of registered models in your namespace.
+   > **Note**: If you do not see the Model Registry, ensure a Model Registry instance has been deployed in your namespace by the platform administrator.
+
+## Register a Model
+
+You can register models either through the user interface or programmatically using the Python client.
+
+### Option 1: Using the UI
+
+1. **Create Registered Model**:
+   - Click **Register Model**.
+   - **Model Name**: Enter a unique name (e.g., `fraud-detection`).
+   - **Description**: Add a description.
+   - **Version details**: Optionally add version information, tags, and metadata.
+   - **Model Location**: Provide the S3/URI to the model artifact (e.g., `s3://my-bucket/models/fraud-detection/v1/`).
+   - Click **Create**.
+
+2. **Create Version**:
+   - Click on drop down menu next to the **Registered Model** and select **Register New Version**.
+   - Enter version name, description, metadata and artifact URI.
+   - Click **Register new version**.
+
+### Option 2: Using Python Client
+
+You can register models directly from your Jupyter Notebook using the `model-registry` Python client.
+
+**Prerequisites**:
+- Install the client: `python -m pip install model-registry=="0.3.5" kserve=="0.13"`
+- Ensure you have access to the Model Registry service. If running inside a Kubeflow Notebook, you can use the internal service DNS (e.g. `http://model-registry-service.<namespace>.svc:8080`).
+
+**Sample Code**:
+
+The following example demonstrates how to register a model stored in S3.
+
+```python
+from model_registry import ModelRegistry
+
+# 1. Connect to the Model Registry
+# Replace with your actual Model Registry service host/port
+# Inside cluster, typically: "http://model-registry-service.<namespace>.svc.cluster.local:8080"
+registry = ModelRegistry(
+    server_address="http://model-registry-service.kubeflow.svc.cluster.local",
+    port=8080,
+    author="your name",
+    is_secure=False
+)
+
+# 2. Register a new Model
+rm = registry.register_model(
+    "iris",
+    "s3://kfserving-examples/models/sklearn/1.0/model",
+    model_format_name="sklearn",
+    model_format_version="1",
+    version="v1",
+    description="Iris scikit-learn model",
+    metadata={
+        "accuracy": 3.14,
+        "license": "BSD 3-Clause License",
+    }
+)
+
+# 3. Retrieve model information
+model = registry.get_registered_model("iris")
+print("Registered Model:", model, "with ID", model.id)
+
+version = registry.get_model_version("iris", "v1")
+print("Model Version:", version, "with ID", version.id)
+
+art = registry.get_model_artifact("iris", "v1")
+print("Model Artifact:", art, "with ID", art.id)
+
+```
+
+## Deploy a Registered Model
+
+Once a model is registered, you can deploy it as an **InferenceService** using KServe.
+
+To deploy, you typically need the URI of the model artifact. You can retrieve this from the Registry UI or via the Python API:
+
+```python
+from kubernetes import client
+import kserve
+
+isvc = kserve.V1beta1InferenceService(
+    api_version=kserve.constants.KSERVE_GROUP + "/v1beta1",
+    kind=kserve.constants.KSERVE_KIND,
+    metadata=client.V1ObjectMeta(
+        name="iris-model",
+        namespace=kserve.utils.get_default_target_namespace(),
+        labels={
+            "modelregistry/registered-model-id": model.id,
+            "modelregistry/model-version-id": version.id,
+        },
+    ),
+    spec=kserve.V1beta1InferenceServiceSpec(
+        predictor=kserve.V1beta1PredictorSpec(
+            model=kserve.V1beta1ModelSpec(
+                storage_uri=art.uri,
+                model_format=kserve.V1beta1ModelFormat(
+                    name=art.model_format_name, version=art.model_format_version
+                ),
+            )
+        )
+    ),
+)
+ks_client = kserve.KServeClient()
+ks_client.create(isvc)
+```
+
+Once deployed, the KServe controller will pull the model from the specified S3 URI and start the inference server.
diff --git a/docs/en/kubeflow/how_to/notebooks.mdx b/docs/en/kubeflow/how_to/notebooks.mdx
@@ -0,0 +1,159 @@
+---
+weight: 10
+---
+
+# Use Kubeflow Notebooks
+
+Kubeflow Notebooks provide a Kubernetes-native Jupyter environment for data scientists to develop, train, and deploy machine learning models. Each notebook server runs as a separate Pod in your namespace, ensuring isolation and dedicated resources.
+
+> **NOTE**: We recommend using Alauda AI Workbench for a more integrated experience with additional features like resource types, configurations, and better integration with other components. However, you can also use the native Kubeflow Notebooks if you prefer a more lightweight setup or need specific features from the upstream project.
+
+## Concepts
+
+- **Notebook Server**: A JupyterLab instance running in a container.
+- **Custom Image**: You can use standard pre-built images (e.g., containing TensorFlow, PyTorch) or provide your own custom Docker image with specific libraries.
+- **Persistent Storage**: By default, notebook servers are attached to Persistent Volume Claims (PVCs) to store your workspace directory (usually `/home/jovyan`). This ensures your notebooks and data are saved even if the server is restarted or updated.
+
+## Create a Notebook Server
+
+1. **Access the Dashboard**:
+   Navigate to the **Notebooks** section in the Kubeflow dashboard.
+
+2. **New Notebook**:
+   Click **New Notebook**. Make sure to select the correct namespace on top of the dashboard where you want to create the notebook server.
+
+3. **Configure the Server**:
+   - **Name**: Enter a unique name for your notebook server.
+   - **Image**:
+     - **Select Type**: Choose the type of image including JupyterLab, Visual Studio Code, or RStudio.
+     - **Select Image**: Choose from a list of pre-built images or specify a custom image by providing the Docker image URL.
+   - **CPU / RAM**: Allocate CPU and Memory resources based on your workload. Start small (e.g., 1 CPU, 2GB RAM) and increase if needed.
+   - **GPUs**: Request GPUs (e.g., NVIDIA) if you plan to run deep learning training or inference tasks that require acceleration.
+   - **Workspace Volume**: This volume mounts to your home directory (`/home/jovyan`). Create a new volume (default) or attach an existing one to access previous work.
+   - **Data Volumes**: (Optional) Attach additional existing PVCs to access large datasets without copying them to your workspace.
+   - **Configurations**: (Optional) Select PodDefaults (if available) to inject generic configurations like S3 credentials, Git config, or environment variables.
+
+4. **Launch**:
+   Click **Launch**. The server will be provisioned. Wait for the status display to turn **Running** (green).
+
+## Connect to the Notebook
+
+Once the server status is **Running**:
+1. Click **Connect**.
+2. This opens the **JupyterLab/VS Code/RStudio** interface in a new browser tab.
+3. You can now create Python 3 notebooks, open a terminal, or manage files.
+
+## Environment Management
+
+### Installing Python Packages
+
+While you can install packages in your home directory to persist them, it is best practice to use a custom image for reproducibility.
+
+Create an "venv" directory in your home and install packages there:
+
+```bash
+python -m venv ~/venv
+source ~/venv/bin/activate
+python -m pip install transformers datasets
+```
+
+When you start a new terminal session, remember to activate the virtual environment to access the installed packages.
+
+To use the virtual environment in Jupyter notebooks, you can install `ipykernel` and create a new kernel:
+
+```bash
+source ~/venv/bin/activate
+python -m pip install ipykernel
+python -m ipykernel install --user --name=venv --display-name "Python (venv)"
+```
+
+Then, in your Jupyter notebook, you can select the "Python (venv)" kernel to use the packages installed in your virtual environment.
+
+Virtual environments are persisted in your home directory, so they will remain available even if you stop and restart the notebook server. However, if you need to share the environment across multiple notebook servers or want better reproducibility, consider building a custom Docker image with the required packages pre-installed.
+
+### Using Custom Images
+
+For production environments or complex dependencies (e.g., system libraries), build a Docker image containing all required libraries and use it as your **Custom Image** when creating the notebook. This ensures exact reproducibility.
+
+## Manage Configurations (PodDefaults)
+
+Kubeflow uses `PodDefault` resources (often labeled as **Configurations** in the UI) to inject common configurations—such as environment variables, volumes, and volume mounts—into Notebooks. This is the standard way to securely provide credentials for Object Storage (S3, MinIO) without hardcoding them in your notebooks.
+
+### Create a PodDefault
+
+You can create a PodDefault by applying a YAML manifest.
+
+Define a `PodDefault` that selects pods with a specific label.
+
+
+```yaml
+apiVersion: kubeflow.org/v1alpha1
+kind: PodDefault
+metadata:
+  name: add-gcp-secret
+  namespace: MY_PROFILE_NAMESPACE
+spec:
+ selector:
+  matchLabels:
+    add-gcp-secret: "true"
+ desc: "add gcp credential"
+ volumeMounts:
+ - name: secret-volume
+   mountPath: /secret/gcp
+ volumes:
+ - name: secret-volume
+   secret:
+    secretName: gcp-secret
+```
+
+### Apply Configuration
+
+When creating a new Notebook Server:
+1. Scroll to the **Configurations** section.
+2. You will see a list of available PodDefaults (e.g., `s3-access`).
+3. Check the box to apply it.
+
+This will automatically inject the specified environment variables or volumes into your Notebook container.
+
+## Accessing Data
+
+### Using Mounted Volumes
+If you attached a data volume (PVC) during creation, it will be available at the specified mount point.
+
+```python
+import pandas as pd
+
+# Assuming you mounted a data volume at /home/jovyan/data
+df = pd.read_csv('/home/jovyan/data/dataset.csv')
+print(df.head())
+```
+
+### Using Object Storage (S3 / MinIO)
+To access data in S3-compatible storage, use libraries like `boto3` or `s3fs`. If your administrator has configured PodDefaults for credentials, environment variables (like `AWS_ACCESS_KEY_ID`) will be pre-populated.
+
+```python
+import os
+import s3fs
+import pandas as pd
+
+# Check if credentials are injected
+print(os.getenv("AWS_S3_ENDPOINT"))
+
+# Read directly from S3
+fs = s3fs.S3FileSystem(
+    client_kwargs={'endpoint_url': os.getenv('AWS_S3_ENDPOINT')},
+    key=os.getenv('AWS_ACCESS_KEY_ID'),
+    secret=os.getenv('AWS_SECRET_ACCESS_KEY')
+)
+
+with fs.open('s3://my-bucket/data/train.csv') as f:
+    df = pd.read_csv(f)
+```
+
+## Best Practices
+
+- **Stop Unused Servers**: Notebook servers consume cluster resources (especially GPUs) even when idle. Stop them when you are not actively working.
+- **Git Integration**: Use the Git extension in JupyterLab (or the terminal) to version control your notebooks. Avoid storing large datasets in Git.
+- **Resource Monitoring**: Monitor your resource usage. If your kernel crashes frequently (OOM), you may need to stop the server and restart it with more Memory limits.
+- **Clean Up**: Periodically delete old notebook servers and their associated PVCs if the data is no longer needed.
+