diff --git a/.gitignore b/.gitignore
index 77d550f7..c10010f1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -21,3 +21,4 @@ yarn-error.log*
+volcano/
diff --git a/blog/Volcano-1.10.0-release.md b/blog/Volcano-1.10.0-release.md
index f26f9954..90e0190d 100644
--- a/blog/Volcano-1.10.0-release.md
+++ b/blog/Volcano-1.10.0-release.md
@@ -62,7 +62,7 @@ Volcano introduced the elastic queue capacity scheduling feature in version v1.9
For detailed design information on elastic queue capacity scheduling, refer to the [Capacity Scheduling Design Document](https://github.com/volcano-sh/volcano/blob/master/docs/design/capacity-scheduling.md).
-For a step-by-step guide on using the capacity plugin, see the [Capacity Plugin User Guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin).
+For a step-by-step guide on using the capacity plugin, see the [Capacity Plugin User Guide](/docs/Scheduler/Plugins/capacity).
Configure each dimension deserved resource samples for the queue:
@@ -88,7 +88,7 @@ For instructions on using the `Device Plugin` to report various GPU models, plea
In version v1.10.0, the `capacity` plugin is the default for queue management. Note that the `capacity` and `proportion` plugins are incompatible, so after upgrading to v1.10.0, you must set the `deserved` field for queues to ensure proper functionality.
-For detailed instructions, please refer to the [Capacity Plugin User Guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin).
+For detailed instructions, please refer to the [Capacity Plugin User Guide](/docs/Scheduler/Plugins/capacity).
The `capacity` plugin allocates cluster resources based on the `deserved` value set by the user, while the `proportion` plugin dynamically allocates resources according to queue weight. Users can select either the `capacity` or `proportion` plugin for queue management based on their specific needs.
diff --git a/blog/Volcano-1.13.0-release.md b/blog/Volcano-1.13.0-release.md
index c9567e43..fb8ad2e3 100644
--- a/blog/Volcano-1.13.0-release.md
+++ b/blog/Volcano-1.13.0-release.md
@@ -142,7 +142,7 @@ Sincerely thanks to community developer: @[Wonki4](https://github.com/Wonki4)
Design documentation: [Ray Framework Plugin Design Doc](https://github.com/volcano-sh/volcano/blob/master/docs/design/distributed-framework-plugins.md).
-Usage documentation: [Ray Plugin User Guide](/docs/UserGuide/user_guide_how_to_use_ray_plugin).
+Usage documentation: [Ray Plugin User Guide](/docs/Controller/Plugins/ray).
## Introduce HCCL Plugin Support
@@ -189,7 +189,7 @@ Sincerely thanks to community developers: @[JesseStutler](https://github.com/Jes
NodeGroup design documentation: [NodeGroup Design.](https://github.com/volcano-sh/volcano/blob/master/docs/design/node-group.md)
-NodeGroup usage documentation: [NodeGroup User Guide.](/docs/UserGuide/user_guide_how_to_use_nodegroup_plugin)
+NodeGroup usage documentation: [NodeGroup User Guide.](/docs/Scheduler/Plugins/nodegroup)
## Introduce ResourceStrategyFit Plugin
@@ -274,7 +274,7 @@ Sincerely thanks to community developers: @[LY-today](https://github.com/LY-toda
Design documentation: [ResourceStrategyFit Design](https://github.com/volcano-sh/volcano/blob/master/docs/design/resource-strategy-fit-scheduling.md)
-Usage documentation: [ResourceStrategyFit User Guide](/docs/UserGuide/user_guide_how_to_use_resource_strategy_fit_plugin)
+Usage documentation: [ResourceStrategyFit User Guide](/docs/Scheduler/Plugins/resource_strategy_fit)
## Decouple Colocation from OS
diff --git a/blog/Volcano-1.7.0-release-en.md b/blog/Volcano-1.7.0-release-en.md
index 0c968ca8..3c4db30a 100644
--- a/blog/Volcano-1.7.0-release-en.md
+++ b/blog/Volcano-1.7.0-release-en.md
@@ -33,7 +33,7 @@ Other enhanced plugins include those for TensorFlow, MPI, and PyTorch Jobs. They
Volcano also provides an extended development framework for you to tailor Job plugins to your needs.
Design Documentation: [Pytorch-plugin](https://github.com/volcano-sh/volcano/blob/master/docs/design/distributed-framework-plugins.md#pytorch-plugin)
-User Guide: [Pytorch Plugin User Guide](/docs/UserGuide/user_guide_how_to_use_pytorch_plugin)
+User Guide: [Pytorch Plugin User Guide](/docs/Controller/Plugins/pytorch)
Issue:[#2292](https://github.com/volcano-sh/volcano/issues/2292)
diff --git a/blog/Volcano-1.9.0-release.md b/blog/Volcano-1.9.0-release.md
index 15fda873..46813776 100644
--- a/blog/Volcano-1.9.0-release.md
+++ b/blog/Volcano-1.9.0-release.md
@@ -59,7 +59,7 @@ spec:
```
For a complete usage example of queue elastic capacity scheduling, please refer to:
-[How to use capacity plugin](/docs/UserGuide/user_guide_how_to_use_capacity_plugin).
+[How to use capacity plugin](/docs/Scheduler/Plugins/capacity).
For the elastic queue capacity design document, please refer to:
[Capacity scheduling Design](https://github.com/volcano-sh/volcano/blob/master/docs/design/capacity-scheduling.md).
diff --git a/docs/Concepts/Queue.md b/docs/Concepts/Queue.md
index 013626cd..03099e4e 100644
--- a/docs/Concepts/Queue.md
+++ b/docs/Concepts/Queue.md
@@ -51,7 +51,7 @@ deserved indicates the expected resource amount for all PodGroups in this queue.
> **Note**:
>
-> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](../UserGuide/user_guide_how_to_use_capacity_plugin.md)
+> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](../Scheduler/Plugins/capacity.md)
> 2. If the allocated resources of a queue exceed its configured deserved value, the queue cannot reclaim resources from other queues
* weight, *optional*
diff --git a/docs/Contribution/_category_.json b/docs/Contribution/_category_.json
index 6e7b3814..37feb0b4 100644
--- a/docs/Contribution/_category_.json
+++ b/docs/Contribution/_category_.json
@@ -1,4 +1,4 @@
{
"label": "Contribution",
- "position": 9
+ "position": 10
}
\ No newline at end of file
diff --git a/docs/Controller/Overview.md b/docs/Controller/Overview.md
new file mode 100644
index 00000000..ae48c502
--- /dev/null
+++ b/docs/Controller/Overview.md
@@ -0,0 +1,7 @@
+---
+title: Overview
+---
+
+## Introduction
+
+The Volcano Job Controller manages the lifecycle of jobs and dynamically configures job environments using plugins such as SSH, SVC, MPI, and PyTorch.
diff --git a/docs/Controller/Plugins/_category_.json b/docs/Controller/Plugins/_category_.json
new file mode 100644
index 00000000..f3accd2f
--- /dev/null
+++ b/docs/Controller/Plugins/_category_.json
@@ -0,0 +1,4 @@
+{
+ "label": "Plugins",
+ "key": "controller-plugins"
+}
diff --git a/docs/UserGuide/user_guide_how_to_use_env_plugin.md b/docs/Controller/Plugins/env.md
similarity index 98%
rename from docs/UserGuide/user_guide_how_to_use_env_plugin.md
rename to docs/Controller/Plugins/env.md
index 3b5d999f..99fab7ed 100644
--- a/docs/UserGuide/user_guide_how_to_use_env_plugin.md
+++ b/docs/Controller/Plugins/env.md
@@ -1,5 +1,5 @@
---
-title: "Volcano Job Plugin -- Env User Guide"
+title: Env
---
diff --git a/docs/Controller/Plugins/hcclrank.md b/docs/Controller/Plugins/hcclrank.md
new file mode 100644
index 00000000..ab556003
--- /dev/null
+++ b/docs/Controller/Plugins/hcclrank.md
@@ -0,0 +1,59 @@
+---
+title: HCCLRank
+---
+
+## Introduction
+
+In distributed AI training, particularly when using Ascend NPUs (Neural Processing Units) or MindSpore frameworks, the compute nodes need a deterministic rank or index to communicate over HCCL (Huawei Collective Communication Library).
+
+The **HCCLRank Plugin** is a Volcano Job plugin that automatically injects a `hccl/rankIndex` annotation into the Pods of a Volcano Job. It calculates a unique rank for each pod based on its task type (`master` or `worker`) and its replica index.
+
+## Mechanism
+
+During the Pod creation phase (`OnPodCreate`), the HCCLRank Plugin intercepts the pod and adds the `hccl/rankIndex` annotation to it.
+
+The calculation is as follows:
+- **Master Role**: Rank = Pod Index
+- **Worker Role**: Rank = (Total Master Replicas) + Pod Index
+
+If the Pod already has a `RANK` environment variable defined in its container specifications, the plugin will use that value instead and simply map it to the `hccl/rankIndex` annotation.
+
+## Configuration
+
+To enable the HCCLRank plugin, configure it within the Volcano job controller's configuration or add it to the `plugins` field of your `VolcanoJob` spec.
+
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+ name: ascend-distributed-training
+spec:
+ minAvailable: 3
+ schedulerName: volcano
+ plugins:
+ hcclrank:
+ - --master=master
+ - --worker=worker
+ tasks:
+ - replicas: 1
+ name: master
+ template:
+ spec:
+ containers:
+ - name: master
+ image: my-ascend-image
+ - replicas: 2
+ name: worker
+ template:
+ spec:
+ containers:
+ - name: worker
+ image: my-ascend-image
+```
+
+### Arguments
+
+The HCCLRank plugin supports overriding the default task names used to identify master and worker roles:
+
+- **`--master`**: The name of the master role task in your Job spec. Default is `master`.
+- **`--worker`**: The name of the worker role task in your Job spec. Default is `worker`.
diff --git a/docs/UserGuide/user_guide_how_to_use_mpi_plugin.md b/docs/Controller/Plugins/mpi.md
similarity index 98%
rename from docs/UserGuide/user_guide_how_to_use_mpi_plugin.md
rename to docs/Controller/Plugins/mpi.md
index a46ece2f..77743da5 100644
--- a/docs/UserGuide/user_guide_how_to_use_mpi_plugin.md
+++ b/docs/Controller/Plugins/mpi.md
@@ -1,5 +1,5 @@
---
-title: "MPI Plugin User Guide"
+title: MPI
---
diff --git a/docs/UserGuide/user_guide_how_to_use_pytorch_plugin.md b/docs/Controller/Plugins/pytorch.md
similarity index 99%
rename from docs/UserGuide/user_guide_how_to_use_pytorch_plugin.md
rename to docs/Controller/Plugins/pytorch.md
index a4f604d6..9d7e1039 100644
--- a/docs/UserGuide/user_guide_how_to_use_pytorch_plugin.md
+++ b/docs/Controller/Plugins/pytorch.md
@@ -1,5 +1,5 @@
---
-title: "Pytorch Plugin User Guide"
+title: Pytorch
---
diff --git a/docs/UserGuide/user_guide_how_to_use_ray_plugin.md b/docs/Controller/Plugins/ray.md
similarity index 99%
rename from docs/UserGuide/user_guide_how_to_use_ray_plugin.md
rename to docs/Controller/Plugins/ray.md
index acd4c1d7..65c33136 100644
--- a/docs/UserGuide/user_guide_how_to_use_ray_plugin.md
+++ b/docs/Controller/Plugins/ray.md
@@ -1,5 +1,5 @@
---
-title: "Ray Plugin User Guide"
+title: Ray
---
diff --git a/docs/UserGuide/user_guide_how_to_use_ssh_plugin.md b/docs/Controller/Plugins/ssh.md
similarity index 99%
rename from docs/UserGuide/user_guide_how_to_use_ssh_plugin.md
rename to docs/Controller/Plugins/ssh.md
index 6bc4bd18..15c9e39e 100644
--- a/docs/UserGuide/user_guide_how_to_use_ssh_plugin.md
+++ b/docs/Controller/Plugins/ssh.md
@@ -1,5 +1,5 @@
---
-title: "Volcano Job Plugin -- SSH User Guide"
+title: SSH
---
diff --git a/docs/UserGuide/user_guide_how_to_use_svc_plugin.md b/docs/Controller/Plugins/svc.md
similarity index 99%
rename from docs/UserGuide/user_guide_how_to_use_svc_plugin.md
rename to docs/Controller/Plugins/svc.md
index 72a5a3b8..15145701 100644
--- a/docs/UserGuide/user_guide_how_to_use_svc_plugin.md
+++ b/docs/Controller/Plugins/svc.md
@@ -1,5 +1,5 @@
---
-title: "Volcano Job Plugin -- SVC User Guide"
+title: SVC
---
## Background
diff --git a/docs/Controller/_category_.json b/docs/Controller/_category_.json
new file mode 100644
index 00000000..51e5301d
--- /dev/null
+++ b/docs/Controller/_category_.json
@@ -0,0 +1,4 @@
+{
+ "label": "Controller",
+ "position": 6
+}
diff --git a/docs/Ecosystem/RayOnVolcano.md b/docs/Ecosystem/RayOnVolcano.md
index 034c86c2..be1900a4 100644
--- a/docs/Ecosystem/RayOnVolcano.md
+++ b/docs/Ecosystem/RayOnVolcano.md
@@ -203,4 +203,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### Learn More
- For KubeRay integration details, visit the [KubeRay Volcano Scheduler Documentation](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/Controller/Plugins/ray)
\ No newline at end of file
diff --git a/docs/Ecosystem/_category_.json b/docs/Ecosystem/_category_.json
index ed0ef521..aaa2c623 100644
--- a/docs/Ecosystem/_category_.json
+++ b/docs/Ecosystem/_category_.json
@@ -1,4 +1,4 @@
{
"label": "Ecosystem",
- "position": 6
+ "position": 9
}
\ No newline at end of file
diff --git a/docs/KeyFeatures/QueueResourceManagement.md b/docs/KeyFeatures/QueueResourceManagement.md
index d7919757..dca21c71 100644
--- a/docs/KeyFeatures/QueueResourceManagement.md
+++ b/docs/KeyFeatures/QueueResourceManagement.md
@@ -64,7 +64,7 @@ Queue scheduling in Volcano involves the following core actions:
Volcano provides two core queue scheduling plugins:
#### capacity plugin
-[capacity plugin](/docs/UserGuide/user_guide_how_to_use_capacity_plugin) supports setting queue's deserved resource amount through explicit configuration, as shown in this example:
+[capacity plugin](/docs/Scheduler/Plugins/capacity) supports setting queue's deserved resource amount through explicit configuration, as shown in this example:
```yaml
apiVersion: scheduling.volcano.sh/v1beta1
diff --git a/docs/Scheduler/Plugins/_category_.json b/docs/Scheduler/Plugins/_category_.json
new file mode 100644
index 00000000..d4eb446f
--- /dev/null
+++ b/docs/Scheduler/Plugins/_category_.json
@@ -0,0 +1,4 @@
+{
+ "label": "Plugins",
+ "key": "scheduler-plugins"
+}
diff --git a/docs/UserGuide/user_guide_how_to_use_capacity_plugin.md b/docs/Scheduler/Plugins/capacity.md
similarity index 99%
rename from docs/UserGuide/user_guide_how_to_use_capacity_plugin.md
rename to docs/Scheduler/Plugins/capacity.md
index 8843a7de..9bb70dbf 100644
--- a/docs/UserGuide/user_guide_how_to_use_capacity_plugin.md
+++ b/docs/Scheduler/Plugins/capacity.md
@@ -1,5 +1,5 @@
---
-title: "Capacity Plugin User Guide"
+title: Capacity
---
diff --git a/docs/UserGuide/user_guide_how_to_use_cdp_plugin.md b/docs/Scheduler/Plugins/cdp.md
similarity index 99%
rename from docs/UserGuide/user_guide_how_to_use_cdp_plugin.md
rename to docs/Scheduler/Plugins/cdp.md
index f09a8b64..2e0ea628 100644
--- a/docs/UserGuide/user_guide_how_to_use_cdp_plugin.md
+++ b/docs/Scheduler/Plugins/cdp.md
@@ -1,5 +1,5 @@
---
-title: "Cooldown Protection Plugin User Guide"
+title: CDP
---
diff --git a/docs/Scheduler/Plugins/deviceshare.md b/docs/Scheduler/Plugins/deviceshare.md
new file mode 100644
index 00000000..4448f6bc
--- /dev/null
+++ b/docs/Scheduler/Plugins/deviceshare.md
@@ -0,0 +1,41 @@
+---
+title: DeviceShare
+---
+
+## Introduction
+
+The **DeviceShare Plugin** is an advanced resource scheduling plugin in Volcano that provides a common framework for sharing specialized hardware devices (like GPUs, NPUs, FPGAs) across multiple pods.
+
+Rather than implementing fragmented logic for each new hardware accelerator, Volcano exposes a unified `Devices` interface. The `deviceshare` plugin leverages this interface to perform robust allocation, node filtering, and resource tracking for shared devices.
+
+## Mechanism
+
+The DeviceShare plugin works in conjunction with device-specific implementations. It exposes standard scheduling operations such as `Predicate` (filtering nodes based on available device capacity) and `Allocate`/`Release` (assigning portions of a device to specific pods).
+
+Currently, the `deviceshare` plugin serves as the underlying engine powering features like:
+- **GPU Sharing**: Allowing multiple pods to request fractions of a single physical GPU's memory.
+- **vGPU and vNPU**: Virtualizing accelerator slices.
+- **GPU Exclusive**: Restricting a pod to exclusively own a GPU to avoid contention.
+
+## Configuration and Usage
+
+The `deviceshare` plugin is typically enabled implicitly when you enable device sharing predicates in the Volcano scheduler config map. However, if you are developing custom device sharing logic or need to explicitly declare it, it can be configured in your `volcano-scheduler-configmap`:
+
+```yaml
+actions: "enqueue, allocate, backfill"
+tiers:
+ - plugins:
+ - name: priority
+ - name: gang
+ - name: conformance
+ - name: deviceshare # Enable the device share framework plugin
+ - plugins:
+ - name: overcommit
+ - name: drf
+ - name: predicates
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+```
+
+> **Note:** For specific guides on how to configure your workloads to request shared GPUs or NPUs, please refer to the dedicated guides for [GPU Sharing](../../UserGuide/user_guide_how_to_use_gpu_sharing) and [vNPU](../../UserGuide/user_guide_how_to_use_vnpu).
diff --git a/docs/UserGuide/user_guide_how_to_use_nodegroup_plugin.md b/docs/Scheduler/Plugins/nodegroup.md
similarity index 99%
rename from docs/UserGuide/user_guide_how_to_use_nodegroup_plugin.md
rename to docs/Scheduler/Plugins/nodegroup.md
index 6499f4f5..c9495b4d 100644
--- a/docs/UserGuide/user_guide_how_to_use_nodegroup_plugin.md
+++ b/docs/Scheduler/Plugins/nodegroup.md
@@ -1,5 +1,5 @@
---
-title: "Nodegroup Plugin User Guide"
+title: Nodegroup
---
diff --git a/docs/Scheduler/Plugins/overcommit.md b/docs/Scheduler/Plugins/overcommit.md
new file mode 100644
index 00000000..aad3d549
--- /dev/null
+++ b/docs/Scheduler/Plugins/overcommit.md
@@ -0,0 +1,44 @@
+---
+title: Overcommit
+---
+
+## Introduction
+
+In typical cluster environments, the scheduler calculates available idle resources strictly based on physical node capacity minus allocated resources. However, when cluster resources are nearly fully utilized, many PodGroups are rejected from entering the scheduling pipeline and are left completely un-enqueued, which might not be desirable for scenarios where you want the scheduler to tolerate a larger backlog of `pending` pods.
+
+The **Overcommit Plugin** allows the scheduler to artificially inflate the apparent "idle resources" of the cluster by applying an `overcommit-factor`. This permits more jobs to be enqueued and wait in the scheduling pipeline than the physical resources might typically allow.
+
+## Mechanism
+
+The Overcommit plugin evaluates whether a job can be enqueued based on the requested `MinResources` of the PodGroup and the expanded idle resources.
+
+Expanded idle resource is calculated as:
+`Idle Resource = (Total Resource * overcommit-factor) - Used Resource`
+
+If the job's minimal requested resources can fit into this expanded idle resource pool, the job is permitted to be enqueued.
+
+## Configuration
+
+To use the Overcommit Plugin, add it to your `volcano-scheduler-configmap` under the `enqueue` tier, and provide an `overcommit-factor`.
+
+```yaml
+actions: "enqueue, allocate, backfill"
+tiers:
+ - plugins:
+ - name: overcommit # Enable the overcommit plugin
+ arguments:
+ overcommit-factor: 1.2 # The overcommit factor. Default is 1.2
+ - name: priority
+ - name: gang
+ - name: conformance
+ - plugins:
+ - name: drf
+ - name: predicates
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+```
+
+### Arguments
+
+- **`overcommit-factor`**: A float value greater than or equal to `1.0`. For example, `1.2` means the scheduler will pretend the cluster has 20% more total resources when deciding whether to enqueue jobs into the pipeline. If a value less than `1.0` is provided, the plugin will automatically fallback to the default value of `1.2`.
diff --git a/docs/Scheduler/Plugins/pdb.md b/docs/Scheduler/Plugins/pdb.md
new file mode 100644
index 00000000..f409308c
--- /dev/null
+++ b/docs/Scheduler/Plugins/pdb.md
@@ -0,0 +1,43 @@
+---
+title: PDB
+---
+
+## Introduction
+
+When users deploy highly available jobs or applications on Volcano, they often need to limit the number of pod replicas that can be evicted or destroyed simultaneously to avoid downtime. This constraint is managed via Kubernetes **PodDisruptionBudget (PDB)** resources.
+
+The **PDB Plugin** ensures that Volcano respects user-defined PDB constraints during the scheduling process, specifically during eviction actions like `reclaim`, `preempt`, and `shuffle`.
+
+## Prerequisites
+
+- Your Kubernetes version must be 1.21 or later.
+- You must have created valid `PodDisruptionBudget` resources for your workloads.
+
+## Mechanism
+
+The PDB Plugin registers several functions (`ReclaimableFn`, `PreemptableFn`, and `VictimTasksFn`) under the `reclaim`, `preempt`, and `shuffle` actions. It maintains a cache of PDBs using `v1.PodDisruptionBudgetLister`.
+
+During eviction scenarios, the plugin filters out tasks whose eviction would violate the configured PDB constraints. It calculates the `DisruptedPods` (pods whose eviction was processed but not yet observed by the PDB controller) and ensures the remaining available replicas satisfy the budget.
+
+## Configuration
+
+To enable the PDB Plugin, update the `volcano-scheduler-configmap` to include the `pdb` plugin in your configuration tiers.
+
+```yaml
+actions: "reclaim, preempt, shuffle"
+tiers:
+- plugins:
+ - name: pdb # Enable the PDB plugin
+ - name: priority
+ - name: gang
+ - name: conformance
+- plugins:
+ - name: overcommit
+ - name: drf
+ - name: predicates
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+```
+
+*Note: The PDB plugin will be actively invoked when actions like `reclaim`, `preempt`, or `shuffle` are executed in the scheduler workflow.*
diff --git a/docs/Scheduler/Plugins/rescheduling.md b/docs/Scheduler/Plugins/rescheduling.md
new file mode 100644
index 00000000..a4c2e541
--- /dev/null
+++ b/docs/Scheduler/Plugins/rescheduling.md
@@ -0,0 +1,72 @@
+---
+title: Rescheduling
+---
+
+## Introduction
+
+Unbalanced resource utilization across a Kubernetes cluster often occurs due to unreasonable scheduling strategies, dynamic changes in job lifecycles, and node status changes (such as added/removed nodes or taint/affinity modifications).
+
+The **Rescheduling** plugin addresses these issues by actively rebalancing the cluster's resource utilization among nodes. It accomplishes this by evaluating real resource utilization (via Prometheus metrics) instead of merely the requested resource amounts, and it periodically evicts pods based on custom configured rescheduling strategies.
+
+## Rescheduling Workflow
+
+1. **Resource Filter**: Filters workloads which are eligible to be evicted based on queues or labels.
+2. **Strategy Evaluation**: Evaluates filtered workloads against the configured rescheduling strategies to determine which ones should be evicted.
+3. **Eviction**: Evicts the pods attached to the identified workloads.
+4. **Periodical Execution**: Executes the above process periodically.
+
+## Rescheduling Strategies
+
+Volcano's rescheduling plugin supports multiple strategies to select potential evictees:
+
+- **LowNodeUtilization**: Targets unbalanced nodes by evicting pods from highly utilized nodes and shuffling them to low utilized nodes based on configured target thresholds.
+- **OfflineOnly (OLO)**: Only selects offline workloads (annotated with `preemptable: true`) for rescheduling.
+- **LowPriorityFirst (LPF)**: Sorts workloads by priority and evicts lower priority pods first.
+- **ShortLifeTimeFirst (SLTF)**: Sorts workloads by running time. Pods with the shortest life time will be rescheduled first to ensure long-running workloads are not interrupted.
+- **BigObjectFirst (BOF)**: Selects workloads which request the most dominant resource and reschedules them first to improve system throughput and avoid small workloads starvation.
+- **MoreReplicasFirst (MRF)**: Sorts workloads by replica number. Workloads with the most replicas are rescheduled first, making it friendly to `gang` scheduling by considering `minAvailable`.
+
+## Configuration
+
+To enable the Rescheduling plugin, you must configure the `volcano-scheduler-configmap` by adding the `shuffle` action and configuring the `rescheduling` plugin within the tiers.
+
+```yaml
+actions: "enqueue, allocate, backfill, shuffle" ## Add 'shuffle' action
+tiers:
+ - plugins:
+ - name: priority
+ - name: gang
+ - name: conformance
+ - name: rescheduling ## Rescheduling plugin
+ arguments:
+ interval: 5m ## Optional. Frequency at which the strategies are called. Default is 5m.
+ metricsPeriod: 5m ## Optional. The duration of metrics to consider. Default is 5m.
+ strategies: ## Required. Strategies to execute in order.
+ - name: offlineOnly
+ - name: lowPriorityFirst
+ - name: lowNodeUtilization
+ params:
+ thresholds:
+ "cpu" : 20 ## Threshold below which a node is considered under-utilized
+ "memory": 20
+ "pods": 20
+ targetThresholds:
+ "cpu" : 50 ## Target utilization to reach for balance
+ "memory": 50
+ "pods": 50
+ queueSelector: ## Optional. Select workloads in specified queues as potential evictees. All queues by default.
+ - default
+ - test-queue
+ labelSelector: ## Optional. Select workloads with specified labels as potential evictees. All labels by default.
+ business: offline
+ team: test
+ - plugins:
+ - name: overcommit
+ - name: drf
+ - name: predicates
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+```
+
+> **Note:** The rescheduling decisions consider metrics collected from Prometheus. Ensure your metrics configuration is correctly set up as it evaluates real node resource utilization instead of requested resource amounts.
diff --git a/docs/UserGuide/user_guide_how_to_use_resource_strategy_fit_plugin.md b/docs/Scheduler/Plugins/resource_strategy_fit.md
similarity index 99%
rename from docs/UserGuide/user_guide_how_to_use_resource_strategy_fit_plugin.md
rename to docs/Scheduler/Plugins/resource_strategy_fit.md
index ec395a45..0eced3c6 100644
--- a/docs/UserGuide/user_guide_how_to_use_resource_strategy_fit_plugin.md
+++ b/docs/Scheduler/Plugins/resource_strategy_fit.md
@@ -1,5 +1,5 @@
---
-title: "Resource Strategy Fit Plugin User Guide"
+title: Resource Strategy Fit
---
diff --git a/docs/Scheduler/Plugins/resourcequota.md b/docs/Scheduler/Plugins/resourcequota.md
new file mode 100644
index 00000000..1a9f8411
--- /dev/null
+++ b/docs/Scheduler/Plugins/resourcequota.md
@@ -0,0 +1,40 @@
+---
+title: ResourceQuota
+---
+
+## Introduction
+
+In a multi-tenant cluster, administrators commonly use Kubernetes `ResourceQuota` to limit the total resource consumption per namespace. However, by default, the scheduler may continuously try to enqueue and schedule pod groups even if the namespace's `ResourceQuota` is insufficient, leading to pending jobs and scheduler overhead.
+
+The **ResourceQuota Plugin** solves this problem by ensuring that a PodGroup is only allowed to enqueue if there is sufficient resource capacity in the namespace to satisfy the minimal resource quota required by the PodGroup.
+
+## Mechanism
+
+Volcano implements the ResourceQuota plugin using the `AddJobEnqueueableFn` function.
+
+1. **Namespace Capacity Cache**: The plugin maintains an `RQStatus` map to cache all resource quotas for each namespace.
+2. **Evaluation**: It calculates the `minQuotas`—which defines the minimal resource quota required to run the pod group.
+3. **Enqueue Admission**: When evaluating pending PodGroups, the plugin allows them to enqueue *only* if the namespace's Kubernetes `ResourceQuota` has enough available capacity. It also considers PodGroups that have already been permitted in the current scheduling round to prevent race conditions that could exceed the namespace quota.
+
+## Configuration
+
+To enable the ResourceQuota Plugin, add it to your `volcano-scheduler-configmap`.
+
+```yaml
+actions: "enqueue, allocate, backfill"
+tiers:
+ - plugins:
+ - name: priority
+ - name: gang
+ - name: conformance
+ - name: resourcequota # Enable the ResourceQuota plugin
+ - plugins:
+ - name: overcommit
+ - name: drf
+ - name: predicates
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+```
+
+Once enabled, Volcano will respect the Kubernetes native `ResourceQuota` configuration and prevent jobs from enqueuing into the scheduling pipeline if they cannot possibly fit within their namespace quota.
diff --git a/docs/UserGuide/user_guide_how_to_use_task_topology_plugin.md b/docs/Scheduler/Plugins/task_topology.md
similarity index 97%
rename from docs/UserGuide/user_guide_how_to_use_task_topology_plugin.md
rename to docs/Scheduler/Plugins/task_topology.md
index e253691f..58f02186 100644
--- a/docs/UserGuide/user_guide_how_to_use_task_topology_plugin.md
+++ b/docs/Scheduler/Plugins/task_topology.md
@@ -1,5 +1,5 @@
---
-title: "Task Topology Plugin User Guide"
+title: Task Topology
---
diff --git a/docs/Scheduler/Plugins/usage.md b/docs/Scheduler/Plugins/usage.md
new file mode 100644
index 00000000..9b7c1cf9
--- /dev/null
+++ b/docs/Scheduler/Plugins/usage.md
@@ -0,0 +1,103 @@
+---
+title: Usage-based Scheduling
+---
+
+## Introduction
+
+Currently, Kubernetes pod scheduling is based on resource requests and node allocatable resources rather than the actual node usage. This can lead to unbalanced resource usage across compute nodes, where pods might be scheduled to nodes with higher actual usage and lower allocation rates.
+
+The **Usage-based scheduling** plugin allows Volcano to schedule pods based on real-time node usage. This helps to balance the cluster's actual resource usage and avoid scheduling pods on overloaded nodes.
+
+## Mechanism
+
+The Usage-based scheduling plugin primarily performs three functions:
+1. **Predicate**: Filters nodes whose actual resource usage (e.g., CPU, Memory) is higher than the usage threshold defined by the user.
+2. **Prioritize (NodeOrder)**: Prioritizes nodes with lower real-time usage, ensuring that nodes with more idle resources get higher scores.
+3. **Preempt**: Pods on nodes with lower usage can preempt pods on nodes with higher usage to balance the cluster load.
+
+Volcano retrieves the node usage monitoring data from a metrics server (such as Prometheus, Prometheus Adaptor, or Elasticsearch).
+
+## Configuration
+
+To enable the Usage-based scheduling plugin, configure the `volcano-scheduler-configmap`.
+
+```yaml
+actions: "enqueue, allocate, backfill"
+tiers:
+ - plugins:
+ - name: priority
+ - name: gang
+ - name: conformance
+ - name: usage # usage based scheduling plugin
+ enablePredicate: false # If false, new pod scheduling is not disabled when node load reaches the threshold. If true or empty, new pod scheduling is disabled.
+ arguments:
+ usage.weight: 5
+ cpu.weight: 1
+ memory.weight: 1
+ thresholds:
+ cpu: 80 # The node cannot schedule new pods if its actual CPU load reaches 80%.
+ mem: 70 # The node cannot schedule new pods if its actual Memory load reaches 70%.
+ - plugins:
+ - name: overcommit
+ - name: drf
+ - name: predicates
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+
+metrics: # metrics server related configuration
+ type: prometheus # Optional. The metrics source type. Supports "prometheus", "prometheus_adapt", and "elasticsearch". prometheus by default.
+ address: http://192.168.0.10:9090 # Mandatory. The metrics source address.
+ interval: 30s # Optional. The scheduler pulls metrics from Prometheus with this interval, 30s by default.
+```
+
+## Integrating with Metrics Sources
+
+### 1. Prometheus Adaptor (Custom Metrics API)
+
+**Recommended approach**. Ensure the Prometheus Adaptor is installed and custom metrics API is available. Add the following rules to Prometheus Adaptor configuration:
+
+```yaml
+rules:
+ - seriesQuery: '{__name__=~"node_cpu_seconds_total"}'
+ resources:
+ overrides:
+ instance:
+ resource: node
+ name:
+ matches: "node_cpu_seconds_total"
+ as: "node_cpu_usage_avg"
+ metricsQuery: avg_over_time((1 - avg (irate(<<.Series>>{mode="idle"}[5m])) by (instance))[10m:30s])
+ - seriesQuery: '{__name__=~"node_memory_MemTotal_bytes"}'
+ resources:
+ overrides:
+ instance:
+ resource: node
+ name:
+ matches: "node_memory_MemTotal_bytes"
+ as: "node_memory_usage_avg"
+ metricsQuery: avg_over_time(((1-node_memory_MemAvailable_bytes/<<.Series>>))[10m:30s])
+```
+
+Set the metrics `type` in the scheduler configmap to `prometheus_adaptor`.
+
+### 2. Prometheus
+
+Set the `metrics.type` to `prometheus` and provide the `metrics.address` directly to the Prometheus server as shown in the configuration example.
+
+### 3. Elasticsearch
+
+Set the `metrics.type` to `elasticsearch` and provide the following configuration:
+```yaml
+metrics:
+ type: elasticsearch
+ address: http://192.168.0.10:9200
+ interval: 30s
+ tls:
+ insecureSkipVerify: "false"
+ elasticsearch:
+ index: "custom-index-name" # Optional. "metricbeat-*" by default
+ username: ""
+ password: ""
+ hostnameFieldName: "host.hostname" # Optional. "host.hostname" by default
+```
diff --git a/docs/Scheduler/_category_.json b/docs/Scheduler/_category_.json
index a59dd3a2..78364d08 100644
--- a/docs/Scheduler/_category_.json
+++ b/docs/Scheduler/_category_.json
@@ -1,4 +1,4 @@
{
"label": "Scheduler",
- "position": 7
+ "position": 5
}
\ No newline at end of file
diff --git a/docs/UserGuide/_category_.json b/docs/UserGuide/_category_.json
new file mode 100644
index 00000000..ee56cffa
--- /dev/null
+++ b/docs/UserGuide/_category_.json
@@ -0,0 +1,4 @@
+{
+ "label": "User Guide",
+ "position": 7
+}
diff --git a/docs/UserGuide/user_guide.md b/docs/UserGuide/user_guide.md
index 1d8ef16b..b02f8b51 100644
--- a/docs/UserGuide/user_guide.md
+++ b/docs/UserGuide/user_guide.md
@@ -6,8 +6,8 @@ sidebar_position: 5
This section contains the User Guide for Volcano.
* [Ascend vNPU User Guide](./user_guide_how_to_use_vnpu.md)
-* [Capacity Plugin User Guide](./user_guide_how_to_use_capacity_plugin.md)
-* [Cooldown Protection Plugin User Guide](./user_guide_how_to_use_cdp_plugin.md)
+* [Capacity Plugin User Guide](../Scheduler/Plugins/capacity.md)
+* [Cooldown Protection Plugin User Guide](../Scheduler/Plugins/cdp.md)
* [Extender User Guide](./user_guide_how_to_use_extender.md)
* [GPU Number User Guide](./user_guide_how_to_use_gpu_number.md)
* [GPU Sharing User Guide](./user_guide_how_to_use_gpu_sharing.md)
@@ -17,17 +17,18 @@ This section contains the User Guide for Volcano.
* [Dynamic Resource Allocation (DRA) User Guide](./user_guide_how_to_enable_dra.md)
* [How to Tune Volcano Performance in Large-Scale Scenarios](./user_guide_how_to_tune_volcano_performance.md)
* [How to configure priorityclass for job](./user_guide_how_to_configure_priorityclass_for_job.md)
-* [MPI Plugin User Guide](./user_guide_how_to_use_mpi_plugin.md)
+* [MPI Plugin User Guide](../Controller/Plugins/mpi.md)
* [NUMA Aware User Guide](./user_guide_how_to_use_numa_aware.md)
* [Network Topology Aware Scheduling User Guide](./user_guide_how_to_use_network_topology_aware_scheduling.md)
-* [Nodegroup Plugin User Guide](./user_guide_how_to_use_nodegroup_plugin.md)
-* [Pytorch Plugin User Guide](./user_guide_how_to_use_pytorch_plugin.md)
-* [Ray Plugin User Guide](./user_guide_how_to_use_ray_plugin.md)
-* [Resource Strategy Fit Plugin User Guide](./user_guide_how_to_use_resource_strategy_fit_plugin.md)
-* [Task Topology Plugin User Guide](./user_guide_how_to_use_task_topology_plugin.md)
+* [Nodegroup Plugin User Guide](../Scheduler/Plugins/nodegroup.md)
+* [Pytorch Plugin User Guide](../Controller/Plugins/pytorch.md)
+* [Ray Plugin User Guide](../Controller/Plugins/ray.md)
+* [Resource Strategy Fit Plugin User Guide](../Scheduler/Plugins/resource_strategy_fit.md)
+* [Task Topology Plugin User Guide](../Scheduler/Plugins/task_topology.md)
* [HyperNode Auto Discovery User Guide](./user_guide_how_to_use_hypernode_auto_discovery.md)
-* [Volcano Job Plugin -- Env User Guide](./user_guide_how_to_use_env_plugin.md)
-* [Volcano Job Plugin -- SSH User Guide](./user_guide_how_to_use_ssh_plugin.md)
-* [Volcano Job Plugin -- SVC User Guide](./user_guide_how_to_use_svc_plugin.md)
+* [Volcano Job Plugin -- Env User Guide](../Controller/Plugins/env.md)
+* [Volcano Job Plugin -- SSH User Guide](../Controller/Plugins/ssh.md)
+* [Volcano Job Plugin -- SVC User Guide](../Controller/Plugins/svc.md)
* [Volcano vGPU User Guide](./user_guide_how_to_use_volcano_vgpu.md)
* [Scheduling Gates Queue Admission User Guide](./user_guide_how_to_use_scheduling_gates_queue_admission.md)
+
diff --git a/docusaurus.config.js b/docusaurus.config.js
index c6fb4fef..3325431f 100644
--- a/docusaurus.config.js
+++ b/docusaurus.config.js
@@ -5,7 +5,7 @@ import { themes as prismThemes } from "prism-react-renderer";
const config = {
title: "Volcano",
tagline: "Cloud native batch scheduling system",
-
+ favicon: "img/favicons/favicon.ico",
future: {
v4: true,
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.10.0-release.md b/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.10.0-release.md
index 5da3c46f..68aa7002 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.10.0-release.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.10.0-release.md
@@ -65,7 +65,7 @@ Volcano在v1.9版本发布了弹性队列容量capacity调度功能,用户可
弹性队列容量`capacity`调度的设计文档请参考:[Capacity scheduling Design](https://github.com/volcano-sh/volcano/blob/master/docs/design/capacity-scheduling.md)
-使用指导请参考:[Capacity Plugin User Guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin).
+使用指导请参考:[Capacity Plugin User Guide](/docs/Scheduler/Plugins/capacity).
为队列配置每一维度deserved使用样例:
@@ -89,7 +89,7 @@ spec:
**注意:**
-`capacity`在v1.10.0版本中作为了默认的队列管理插件,`capacity`与`proportion`插件互相冲突,当升级到v1.10.0后,你需要再设置队列的`deserved`字段,以保证队列功能正常工作,具体的使用说明请参考:[Capacity Plugin User Guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin).
+`capacity`在v1.10.0版本中作为了默认的队列管理插件,`capacity`与`proportion`插件互相冲突,当升级到v1.10.0后,你需要再设置队列的`deserved`字段,以保证队列功能正常工作,具体的使用说明请参考:[Capacity Plugin User Guide](/docs/Scheduler/Plugins/capacity).
`capacity`插件根据用户指定的队列`deserved`值来划分集群资源,而`proportion`插件则根据队列权重动态划分集群资源,用户可以根据实际场景选择使用`capacity`或者`proportion`插件进行队列管理。proportion插件的介绍请参考:[proportion plugin](/docs/Scheduler/Plugins/proportion).
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.13.0-release.md b/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.13.0-release.md
index 04838157..0b581a8e 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.13.0-release.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.13.0-release.md
@@ -141,7 +141,7 @@ labels:
设计文档:[Ray Framework Plugin Design Doc](https://github.com/volcano-sh/volcano/blob/master/docs/design/distributed-framework-plugins.md)。
-使用文档:[Ray Plugin User Guide](/docs/UserGuide/user_guide_how_to_use_ray_plugin)。
+使用文档:[Ray Plugin User Guide](/docs/Controller/Plugins/ray)。
## 新增HCCL插件支持
@@ -188,7 +188,7 @@ tiers:
NodeGroup设计文档:[NodeGroup Design.](https://github.com/volcano-sh/volcano/blob/master/docs/design/node-group.md)
-NodeGroup使用文档:[NodeGroup User Guide.](/docs/UserGuide/user_guide_how_to_use_nodegroup_plugin)
+NodeGroup使用文档:[NodeGroup User Guide.](/docs/Scheduler/Plugins/nodegroup)
## 引入ResourceStrategyFit插件
@@ -273,7 +273,7 @@ tiers:
设计文档:[ResourceStrategyFit Design](https://github.com/volcano-sh/volcano/blob/master/docs/design/resource-strategy-fit-scheduling.md)
-使用文档:[ResourceStrategyFit User Guide](/docs/UserGuide/user_guide_how_to_use_resource_strategy_fit_plugin)
+使用文档:[ResourceStrategyFit User Guide](/docs/Scheduler/Plugins/resource_strategy_fit)
## 混部能力与操作系统解耦
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.7.0-release-en.md b/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.7.0-release-en.md
index 1eab4eef..e6a44fa0 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.7.0-release-en.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.7.0-release-en.md
@@ -32,7 +32,7 @@ Volcano社区当前提供了TensorFlow、MPI和Pytorch等Job强化功能插件
此外,Volcano还提供了Job插件的扩展开发框架,满足高阶用户针对复杂场景定制Job插件的需求。
设计文档:[Pytorch-plugin](https://github.com/volcano-sh/volcano/blob/master/docs/design/distributed-framework-plugins.md#pytorch-plugin)
-用户手册:[PyTorch 插件用户指南](/docs/UserGuide/user_guide_how_to_use_pytorch_plugin)
+用户手册:[PyTorch 插件用户指南](/docs/Controller/Plugins/pytorch)
Issue:[#2292](https://github.com/volcano-sh/volcano/issues/2292)
#### 2. Ray on Volcano
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.9.0-release.md b/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.9.0-release.md
index 10d0c9a7..c6531127 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.9.0-release.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-blog/Volcano-1.9.0-release.md
@@ -59,7 +59,7 @@ spec:
nvidia.com/v100: 80
```
-队列弹性容量调度的完整使用例子,请参考:[How to use capacity plugin](/docs/UserGuide/user_guide_how_to_use_capacity_plugin).
+队列弹性容量调度的完整使用例子,请参考:[How to use capacity plugin](/docs/Scheduler/Plugins/capacity).
关于弹性队列容量设计文档,请参考[Capacity scheduling Design](https://github.com/volcano-sh/volcano/blob/master/docs/design/capacity-scheduling.md).
@@ -94,7 +94,7 @@ metadata:
-
```
-该功能对应的调度插件名为nodegroup,完整使用例子请参考:[How to use nodegroup plugin](/docs/UserGuide/user_guide_how_to_use_nodegroup_plugin).
+该功能对应的调度插件名为nodegroup,完整使用例子请参考:[How to use nodegroup plugin](/docs/Scheduler/Plugins/nodegroup).
详细设计文档请参考:[The nodegroup design](https://github.com/volcano-sh/volcano/blob/master/docs/design/node-group.md).
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Concepts/Queue.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Concepts/Queue.md
index 13831c77..00569dce 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Concepts/Queue.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Concepts/Queue.md
@@ -52,7 +52,7 @@ deserved表示该queue内所有podgroup的资源应得量,若该queue已分配
**注意**:
-1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](../Scheduler/Plugins/capacity.md)
> 2. 若queue中已分配的资源量超过了自己配置的deserved值,则queue不可再回收其他队列中的资源
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Overview.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Overview.md
new file mode 100644
index 00000000..ae48c502
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Overview.md
@@ -0,0 +1,7 @@
+---
+title: Overview
+---
+
+## Introduction
+
+The Volcano Job Controller manages the lifecycle of jobs and dynamically configures job environments using plugins such as SSH, SVC, MPI, and PyTorch.
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/_category_.json b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/_category_.json
new file mode 100644
index 00000000..f3accd2f
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/_category_.json
@@ -0,0 +1,4 @@
+{
+ "label": "Plugins",
+ "key": "controller-plugins"
+}
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/env.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/env.md
new file mode 100644
index 00000000..99fab7ed
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/env.md
@@ -0,0 +1,104 @@
+---
+title: Env
+
+---
+
+
+## Background
+**Env Plugin** is designed for business that a pod should be aware of its index in the task such as [MPI](https://www.open-mpi.org/)
+and [TensorFlow](https://tensorflow.google.cn/). The indices will be registered as **environment variables** automatically
+when the Volcano job is created. For example, a tensorflow job consists of *1* ps and *2* workers. And each worker maps
+to a slice of raw data. In order to make the workers be aware of its target slice, they get their index in the environment
+variables.
+
+## Key Points
+* The index keys of the environment variables are `VK_TASK_INDEX` and `VC_TASK_INDEX`, they have the same value.
+* The value of the indices is a number which ranges from `0` to `length - 1`. The `length` equals to the number of replicas
+of the task. It is also the index of the pod in the task.
+
+## Examples
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+ name: tensorflow-dist-mnist
+spec:
+ minAvailable: 3
+ schedulerName: volcano
+ plugins:
+ env: [] ## Env plugin register, note that no values are needed in the array.
+ svc: []
+ policies:
+ - event: PodEvicted
+ action: RestartJob
+ queue: default
+ tasks:
+ - replicas: 1
+ name: ps
+ template:
+ spec:
+ containers:
+ - command:
+ - sh
+ - -c
+ - |
+ PS_HOST=`cat /etc/volcano/ps.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`;
+ WORKER_HOST=`cat /etc/volcano/worker.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`;
+ export TF_CONFIG={\"cluster\":{\"ps\":[${PS_HOST}],\"worker\":[${WORKER_HOST}]},\"task\":{\"type\":\"ps\",\"index\":${VK_TASK_INDEX}},\"environment\":\"cloud\"}; ## Get the index from the environment variable and configure it in the TF job.
+ python /var/tf_dist_mnist/dist_mnist.py
+ image: volcanosh/dist-mnist-tf-example:0.0.1
+ name: tensorflow
+ ports:
+ - containerPort: 2222
+ name: tfjob-port
+ resources: {}
+ restartPolicy: Never
+ - replicas: 2
+ name: worker
+ policies:
+ - event: TaskCompleted
+ action: CompleteJob
+ template:
+ spec:
+ containers:
+ - command:
+ - sh
+ - -c
+ - |
+ PS_HOST=`cat /etc/volcano/ps.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`;
+ WORKER_HOST=`cat /etc/volcano/worker.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`;
+ export TF_CONFIG={\"cluster\":{\"ps\":[${PS_HOST}],\"worker\":[${WORKER_HOST}]},\"task\":{\"type\":\"worker\",\"index\":${VK_TASK_INDEX}},\"environment\":\"cloud\"};
+ python /var/tf_dist_mnist/dist_mnist.py
+ image: volcanosh/dist-mnist-tf-example:0.0.1
+ name: tensorflow
+ ports:
+ - containerPort: 2222
+ name: tfjob-port
+ resources: {}
+ restartPolicy: Never
+```
+## Note:
+* When config env plugin in the tensorflow job above, all the pods will have 2 environment variables `VK_TASK_INDEX` and
+`VC_TASK_INDEX`. The environment variables registered in the `ps` pod are as follows.
+```
+[root@tensorflow-dist-mnist-ps-0 /] env | grep TASK_INDEX
+VK_TASK_INDEX=0
+VC_TASK_INDEX=0
+```
+* Considering the 2 workers, you will get that their names are `tensorflow-dist-mnist-worker-0` and
+`tensorflow-dist-mnist-worker-1`. And the corresponding values of the index environment variables are `0` and `1`.
+```
+[root@tensorflow-dist-mnist-worker-0 /] env | grep TASK_INDEX
+VK_TASK_INDEX=0
+VC_TASK_INDEX=0
+```
+```
+[root@tensorflow-dist-mnist-worker-1 /] env | grep TASK_INDEX
+VK_TASK_INDEX=1
+VC_TASK_INDEX=1
+```
+
+## Note:
+* Because of historical reasons, environment variables `VK_TASK_INDEX` and `VC_TASK_INDEX` both exist, `VK_TASK_INDEX` will
+be **deprecated** in the future releases.
+* No value are needed when register env plugin in the volcano job.
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/hcclrank.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/hcclrank.md
new file mode 100644
index 00000000..ab556003
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/hcclrank.md
@@ -0,0 +1,59 @@
+---
+title: HCCLRank
+---
+
+## Introduction
+
+In distributed AI training, particularly when using Ascend NPUs (Neural Processing Units) or MindSpore frameworks, the compute nodes need a deterministic rank or index to communicate over HCCL (Huawei Collective Communication Library).
+
+The **HCCLRank Plugin** is a Volcano Job plugin that automatically injects a `hccl/rankIndex` annotation into the Pods of a Volcano Job. It calculates a unique rank for each pod based on its task type (`master` or `worker`) and its replica index.
+
+## Mechanism
+
+During the Pod creation phase (`OnPodCreate`), the HCCLRank Plugin intercepts the pod and adds the `hccl/rankIndex` annotation to it.
+
+The calculation is as follows:
+- **Master Role**: Rank = Pod Index
+- **Worker Role**: Rank = (Total Master Replicas) + Pod Index
+
+If the Pod already has a `RANK` environment variable defined in its container specifications, the plugin will use that value instead and simply map it to the `hccl/rankIndex` annotation.
+
+## Configuration
+
+To enable the HCCLRank plugin, configure it within the Volcano job controller's configuration or add it to the `plugins` field of your `VolcanoJob` spec.
+
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+ name: ascend-distributed-training
+spec:
+ minAvailable: 3
+ schedulerName: volcano
+ plugins:
+ hcclrank:
+ - --master=master
+ - --worker=worker
+ tasks:
+ - replicas: 1
+ name: master
+ template:
+ spec:
+ containers:
+ - name: master
+ image: my-ascend-image
+ - replicas: 2
+ name: worker
+ template:
+ spec:
+ containers:
+ - name: worker
+ image: my-ascend-image
+```
+
+### Arguments
+
+The HCCLRank plugin supports overriding the default task names used to identify master and worker roles:
+
+- **`--master`**: The name of the master role task in your Job spec. Default is `master`.
+- **`--worker`**: The name of the worker role task in your Job spec. Default is `worker`.
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/mpi.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/mpi.md
new file mode 100644
index 00000000..77743da5
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/mpi.md
@@ -0,0 +1,80 @@
+---
+title: MPI
+
+---
+
+
+## Introduction
+
+**MPI plugin** is designed to optimize the user experience when running MPI jobs, it not only allows users to write less yaml, but also ensures the normal operation of MPI jobs.
+
+## How the MPI Plugin Works
+
+The MPI plugin will do three things:
+
+* Open ports used by MPI for all containers of the job
+* Force open `ssh` and `svc` plugins
+* add `MPI_HOST` environment variable for master pod, this environment variable includes the worker's domain name, It is used by the `--host` parameter of `mpiexec`
+
+## Parameters of the MPI Plugin
+
+### Key Points
+
+* If `master` or `worker` is configured, please ensure that the tasks corresponding to their values exist, and the roles of these tasks correspond to the meaning of the parameters
+* If `port` is configured, make the port value of `sshd` the same as the value of the parameter.
+* If the `gang` plugin is enabled, then make sure that the value of `minAvailable` is **equal** to the number of `replicas of the worker`.
+
+### Arguments
+
+| ID | Name | Type | Default Value | Required | Description | Example |
+| ---- | ------ | ------ | ------------- | -------- | ---------------------------------- | ------------------ |
+| 1 | master | string | master | No | Name of MPI master | --master=mpimaster |
+| 2 | worker | string | worker | No | Name of MPI worker | --worker=mpiworker |
+| 3 | port | string | 22 | No | The port to open for the container | --port=5000 |
+
+## Examples
+
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+ name: lm-mpi-job
+spec:
+ minAvailable: 1
+ schedulerName: volcano
+ plugins:
+ mpi: ["--master=mpimaster","--worker=mpiworker","--port=22"] ## MPI plugin register
+ tasks:
+ - replicas: 1
+ name: mpimaster
+ policies:
+ - event: TaskCompleted
+ action: CompleteJob
+ template:
+ spec:
+ containers:
+ - command:
+ - /bin/sh
+ - -c
+ - |
+ mkdir -p /var/run/sshd; /usr/sbin/sshd;
+ mpiexec --allow-run-as-root --host ${MPI_HOST} -np 2 mpi_hello_world;
+ image: volcanosh/example-mpi:0.0.3
+ name: mpimaster
+ workingDir: /home
+ restartPolicy: OnFailure
+ - replicas: 2
+ name: mpiworker
+ template:
+ spec:
+ containers:
+ - command:
+ - /bin/sh
+ - -c
+ - |
+ mkdir -p /var/run/sshd; /usr/sbin/sshd -D;
+ image: volcanosh/example-mpi:0.0.3
+ name: mpiworker
+ workingDir: /home
+ restartPolicy: OnFailure
+```
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/pytorch.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/pytorch.md
new file mode 100644
index 00000000..9d7e1039
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/pytorch.md
@@ -0,0 +1,95 @@
+---
+title: Pytorch
+---
+
+
+## Introduction
+
+**Pytorch plugin** is designed to optimize the user experience when running pytorch jobs, it not only allows users to write less yaml, but also ensures the normal operation of Pytorch jobs.
+
+## How the Pytorch Plugin Works
+
+The Pytorch Plugin will do the following:
+
+* Open ports used by Pytorch for all containers of the job
+* Force open `svc` plugins
+* Add some envs such like `MASTER_ADDR`, `MASTER_PORT`, `WORLD_SIZE`, `RANK` which pytorch distributed training needed to containers automatically
+* Add an init container to worker pods to wait for the master node to be ready before starting (ensures master starts first)
+
+## Parameters of the Pytorch Plugin
+
+### Arguments
+
+| ID | Name | Type | Default Value | Required | Description | Example |
+| ---- | -------------------- | ------ | ------------------ | -------- | ------------------------------------------------------------------------------------ | -------------------------------------- |
+| 1 | master | string | master | No | Name of Pytorch master | --master=master |
+| 2 | worker | string | worker | No | Name of Pytorch worker | --worker=worker |
+| 3 | port | int | 23456 | No | The port to open for the container | --port=23456 |
+| 4 | wait-master-enabled | bool | false | No | Enable init container to wait for master | --wait-master-enabled=true |
+| 5 | wait-master-timeout | int | 300 | No | Timeout in seconds for waiting master (only effective when wait-master-enabled=true) | --wait-master-timeout=600 |
+| 6 | wait-master-image | string | busybox:1.36.1 | No | Image for wait-for-master init container (only effective when wait-master-enabled=true) | --wait-master-image=busybox:latest |
+
+## Examples
+
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+ name: pytorch-job
+spec:
+ minAvailable: 1
+ schedulerName: volcano
+ plugins:
+ pytorch: [
+ "--master=master",
+ "--worker=worker",
+ "--port=23456",
+ "--wait-master-enabled=true", # Enable init container to wait for master (optional, default: false)
+ "--wait-master-timeout=600", # Timeout in seconds (optional, default: 300)
+ "--wait-master-image=busybox:1.36.1" # Init container image (optional, default: busybox:1.36.1)
+ ]
+ tasks:
+ - replicas: 1
+ name: master
+ policies:
+ - event: TaskCompleted
+ action: CompleteJob
+ template:
+ spec:
+ containers:
+ - image: gcr.io/kubeflow-ci/pytorch-dist-sendrecv-test:1.0
+ imagePullPolicy: IfNotPresent
+ name: master
+ restartPolicy: OnFailure
+ - replicas: 2
+ name: worker
+ template:
+ spec:
+ containers:
+ - image: gcr.io/kubeflow-ci/pytorch-dist-sendrecv-test:1.0
+ imagePullPolicy: IfNotPresent
+ name: worker
+ workingDir: /home
+ restartPolicy: OnFailure
+```
+
+## Notes
+
+* The `wait-for-master` init container feature is **disabled by default**. Enable it by setting `--wait-master-enabled=true`
+* When enabled, an init container will be added to worker pods to ensure the master is ready before starting workers
+* Default init container image is `busybox:1.36.1`, can be customized via `--wait-master-image`
+* Workers will wait for the master to become ready with a configurable timeout (default 300 seconds / 5 minutes)
+* If the master doesn't become ready within the timeout, the worker pod will fail with an error message
+* The init container checks the master's port connectivity using multiple fallback methods:
+ 1. `nc -z` (netcat) if available
+ 2. `/dev/tcp` with timeout command if available
+ 3. `/dev/tcp` direct connection as fallback
+* **Note**: The parameters `--wait-master-timeout` and `--wait-master-image` are only effective when `--wait-master-enabled=true`
+* **Image Requirements**: The custom image should have at least one of the following:
+ * `nc` (netcat) command - recommended, available in busybox, alpine
+ * `/dev/tcp` support in shell - available in bash/sh
+ * Recommended images: `busybox:1.36.1`, `alpine:latest`, `bash:latest`
+* Customization examples:
+ * Enable feature: `--wait-master-enabled=true`
+ * Custom timeout: `--wait-master-enabled=true --wait-master-timeout=600` (10 minutes)
+ * Custom image: `--wait-master-enabled=true --wait-master-image=busybox:latest`
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/ray.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/ray.md
new file mode 100644
index 00000000..65c33136
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/ray.md
@@ -0,0 +1,147 @@
+---
+title: Ray
+---
+
+
+## Introduction
+
+**Ray plugin** is designed to optimize the user experience when deploying a ray cluster, it not only allows users to write less yaml, but also supports users to deploy a ray cluster.
+
+## How the Ray Plugin Works
+
+The Ray Plugin will do three things:
+
+* Configure the commands of head and worker nodes in a ray cluster.
+* Open three ports used by ray head node. (GCS, Ray dashboard and Client server)
+* Create a service mapped to the ray head node container ports. (ex, submit a ray job, Access a ray dashboard and client server)
+
+> *Note*
+> - This plugin is based on the ray cli (Command Line Interface) and this guide use the [official ray docker image](https://hub.docker.com/r/rayproject/ray).
+> - **svc plugin is necessary** when you use the ray plugin.
+
+## Parameters of the Ray Plugin
+
+### Arguments
+
+| ID | Name | Type | Default Value | Required | Description | Example |
+| --- | ---------------- | ------ | ------------- | -------- | --------------------------------------- | ------------------------ |
+| 1 | head | string | head | No | Name of Head Task in Volcano Job | --head=head |
+| 2 | worker | string | worker | No | Name of Worker Task in Volcano Job | --worker=worker |
+| 3 | headContainer | string | head | No | Name of Main Container in a head task | --headContainer=head |
+| 4 | workerContainer | string | worker | No | Name of Main Container in a worker task | --workerContainer=worker |
+| 5 | port | string | 6379 | No | The port to open for the GCS | --port=6379 |
+| 6 | dashboardPort | string | 8265 | No | The port to open for the Ray dashboard | --dashboardPort=8265 |
+| 7 | clientServerPort | string | 10001 | No | The port to open for the client server | --clientServerPort=10001 |
+
+## Examples
+> This guide is based on the instructions provided in the [RayCluster Quick Start.](https://docs.ray.io/en/master/cluster/kubernetes/getting-started/raycluster-quick-start.html#step-4-run-an-application-on-a-raycluster)
+
+First, create a Ray cluster using the YAML manifest shown below.
+- For more details about Ray clusters, see the [Ray Cluster Key Concepts documentation](https://docs.ray.io/en/latest/cluster/key-concepts.html).
+- For more details about How to compose a ray cluster, see the [Launching an On-Premise Cluster](https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/on-premises.html#on-prem).
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+ name: ray-cluster-job
+spec:
+ minAvailable: 3
+ schedulerName: volcano
+ plugins:
+ ray: []
+ svc: []
+ policies:
+ - event: PodEvicted
+ action: RestartJob
+ queue: default
+ tasks:
+ - replicas: 1
+ name: head
+ template:
+ spec:
+ containers:
+ - name: head
+ image: rayproject/ray:latest-py311-cpu
+ resources: {}
+ restartPolicy: OnFailure
+ - replicas: 2
+ name: worker
+ template:
+ spec:
+ containers:
+ - name: worker
+ image: rayproject/ray:latest-py311-cpu
+ resources: {}
+ restartPolicy: OnFailure
+
+```
+
+Once applied, a Ray cluster consisting of a `head node` and one or more `worker nodes` will be provisioned.
+
+```sh
+kubectl get pod
+```
+
+```sh
+NAME READY STATUS RESTARTS AGE
+ray-cluster-job-head-0 1/1 Running 0 106s
+ray-cluster-job-worker-0 1/1 Running 0 106s
+ray-cluster-job-worker-1 1/1 Running 0 106s
+```
+
+Along with the cluster, a `ray-cluster-job-head-svc` [Kubernetes service](https://kubernetes.io/docs/concepts/services-networking/service/) resource is also created.
+(`ray-cluster-job` `service` is created by `svc` plugin.)
+```sh
+kubectl get service
+```
+
+```bash
+NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
+ray-cluster-job ClusterIP None 3s
+ray-cluster-job-head-svc ClusterIP 10.96.184.65 6379/TCP,8265/TCP,10001/TCP 3s
+```
+
+Now that the service name is available, use port-forwarding to access the Ray Dashboard port which is 8265 by default.
+
+```sh
+# Execute this in a separate shell.
+kubectl port-forward service/ray-cluster-job-head-svc 8265:8265 > /dev/null &
+```
+
+Now that the Dashboard port is accessible, submit jobs to the RayCluster:
+```sh
+# The following job's logs will show the Ray cluster's total resource capacity, including 2 CPUs.
+ray job submit --address http://localhost:8265 -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
+```
+
+```sh
+Job submission server address: http://localhost:8265
+
+-------------------------------------------------------
+Job 'raysubmit_W8nYZjW4HEFG6Mqa' submitted successfully
+-------------------------------------------------------
+
+Next steps
+ Query the logs of the job:
+ ray job logs raysubmit_W8nYZjW4HEFG6Mqa
+ Query the status of the job:
+ ray job status raysubmit_W8nYZjW4HEFG6Mqa
+ Request the job to be stopped:
+ ray job stop raysubmit_W8nYZjW4HEFG6Mqa
+
+Tailing logs until the job exits (disable with --no-wait):
+2025-09-23 14:58:49,442 INFO job_manager.py:531 -- Runtime env is setting up.
+2025-09-23 14:59:00,106 INFO worker.py:1630 -- Using address 10.244.2.42:6379 set in the environment variable RAY_ADDRESS
+2025-09-23 14:59:00,144 INFO worker.py:1771 -- Connecting to existing Ray cluster at address: 10.244.2.42:6379...
+2025-09-23 14:59:00,161 INFO worker.py:1942 -- Connected to Ray cluster. View the dashboard at http://10.244.2.42:8265
+{'memory': 16277940225.0, 'node:10.244.4.41': 1.0, 'object_store_memory': 6976260095.0, 'CPU': 30.0, 'node:10.244.3.42': 1.0, 'node:10.244.2.42': 1.0, 'node:__internal_head__': 1.0}
+
+------------------------------------------
+Job 'raysubmit_W8nYZjW4HEFG6Mqa' succeeded
+------------------------------------------
+
+```
+
+Visit `${YOUR_IP}:8265` in your browser for the Dashboard. For example, `127.0.0.1:8265`. See the job you submitted the above in the Recent jobs pane as shown below.
+
+
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/ssh.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/ssh.md
new file mode 100644
index 00000000..15c9e39e
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/ssh.md
@@ -0,0 +1,125 @@
+---
+title: SSH
+
+---
+
+
+## Background
+**SSH Plugin** is designed for the login without password for pods within a volcano job , which is necessary for workloads
+such as [MPI](https://www.open-mpi.org/). It often works with `SVC` plugin.
+
+## Key Points
+* If `ssh-key-file-path` is configured, please ensure the private and public keys exist under the target directory.
+Suggest keeping default value in most scenarios.
+* If `ssh-private-key` or `ssh-public-key` is configured, please ensure the value is correct. Suggest keeping the default
+keys in most scenarios.
+* Once `SSH` plugin is configured, a secret whose name joins the job name and `-ssh` will be created, which contains
+`authorized_keys`/`id_rsa`/`config` and `id_rsa.pub`. It will be mounted to the given path as a volume for all containers
+(including initContainers) within the job.
+* You can get all the hostnames within the job in `/root/.ssh/config` by default. This file contains the pairs of hostname
+and subdomain.
+* If `SSH` plugin is configured, you can sign in any other pods in the same job by `ssh hostname` without password.
+
+## Arguments
+| ID | Name | Type | Default Value | Required | Description | Example |
+|-----|----------------------|--------|---------------------|----------|-----------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| 1 | `ssh-key-file-path` | String | `/root/.ssh` | N | The path used to store ssh private and public keys. | ssh: ["--ssh-key-file-path=/home/user/.ssh"] |
+| 2 | `ssh-private-key` | String | DEFAULT_PRIVATE_KEY | N | The input string of the private key. | ssh: ["--ssh-private-key=-----BEGIN RSA PRIVATE KEY-----\nMIIEpAIBAAKCAQEAyeyZjWDx5Na9bw1f61M4s+QlLT/kyrB37AR2j5Sb/A9hvJak\nLNQQpNC+KVfYNl4jePG+6lwHqye//pcC9+0SWsHWwgaahjMLnAthR2k8JAakNA9x\nV/wHz0YU99OKEetaOuxXpWZPXCHX0zuQO87YbdKzRbgxACirM3Phkwr7XLtQtWZk\nyXG34CQXZQWgBIS1Fl+PlGOpVpOPnWoZPMpbAK74i/Tz4sP8Zhqc6dya1hrbUwY3\nYfMZNYXpaAw7wWVjq8grfs0+Fl3SxHrzTXge2m+eZAZ6iPJ8cX4uYKxi0ZmxpM/a\ngI6Mmjq0MU75Vxpq22LaUvHIpOfX5UxhkrsxlwIDAQABAoIBAQDGOuIb6zpNn4rl\nBMpPqamW4LimjX08hrWUHGWQWyIu96LJk1GlOKMGSm8FA1odNZm5WApG5QYaPrG7\na+DcJ/7G3ljIrdbxPBd/n6RmiKcj7ukwuqBY8fFwyKo5CZEYOmagRfldRO1P02Gf\n22+jZ1MNrbWVElf4gfRgVLj0s+lEhFkzhi+QGMmMpjEJnnG98xxVGEvWMw1rnKJm\n3Gi771Gltbg3GuEPs3IeoBgba3EaHmSxJnBivAL4zsO8UUCAXB13cUiXx8qO7y1e\nCSWSenRmK2ugbL6v0co12O0n0pxF9xlJ6fALdRWzpJsFlN3ttkY9N5GrQc/pVjOa\nvqa172RRAoGBAOSAIMNLT6QjgYDk5Z7ZxjNnxH/lMso+cx6bxk9YMKRrw0fDQh8m\ncBAihXhuntCPDGhrzQ+Anqx4jJVDFqac0xBck90a8LmmzD0q72eDTCYPouDWe6DL\nJQAc/HDmIC13sADEXmGW3c0Qn4hjBnMd89ouYj7ZajU2sED2irPPc/HLAoGBAOI5\nruL4Q0FarGrP3a9z9EDrVJsK2OfSTaJ7rhZ+uvB838svbHU+4mEYPhx4PCwvrYyi\nFn4hyau003ZmLc1qTABjmwcO/PPiYyoRHJDUIIhiIyIL+id/G53uG2eTzqYtU6uS\nnAIB2rKwwhU8ek+zbJBLu5uxuxlf4mdZITdkwtXlAoGBALH3RQ02A9JgQQYFwP2G\nucLhx/6goX05RGoLg1na4w+8Sr0Cy+X9BvzaFkAlUBY5w700cOLpFyxXO48pUGP1\n8sFkiVmFGQZPbfUaEpn5ff6K4R3ijyk97xR2fvrjkR44gOEoECZL3XZQwx/zmFti\nccF1rNksdnb5oC8IliDTq4cfAoGANyy6asECJj5nLuXju5ccS3kZ+XZ70I6KQMbJ\nftMJ5P2P146JdU8RB31SKL9qbZxzR4mA0uKKvUYtDQN+yErUnoOsm9wb9Z+RcAEc\nZnZWOO02hGdHa7qkkbAxHuH91KnZbk8jnZm2LT7PFz7Y1fd80vSlnSOL7nRkU7B5\nWXlJy8ECgYA4g0wc0Jq8c1Q0FulMkOQqYRDXaDo34987L+mZ70i/RtdkKjK/IKJ9\n18UDCyEaDPD0BWBJGPejZkY8UD6FBG/5k7wNIbT7hHLRSRlw4iRmVX2hRVXrXzD8\nvc86Qyg2iG0JqkMAvRdH40amPKp5bW4VcfcvQo4TSsI972u12rgwtg==\n-----END RSA PRIVATE KEY-----\n"] |
+| 3 | `ssh-public-key` | String | DEFAULT_PUBLIC_KEY | N | The input string of the public key. | ssh: ["--ssh-public-key=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDJ7JmNYPHk1r1vDV/rUziz5CUtP+TKsHfsBHaPlJv8D2G8lqQs1BCk0L4pV9g2XiN48b7qXAerJ7/+lwL37RJawdbCBpqGMwucC2FHaTwkBqQ0D3FX/AfPRhT304oR61o67FelZk9cIdfTO5A7ztht0rNFuDEAKKszc+GTCvtcu1C1ZmTJcbfgJBdlBaAEhLUWX4+UY6lWk4+dahk8ylsArviL9PPiw/xmGpzp3JrWGttTBjdh8xk1heloDDvBZWOryCt+zT4WXdLEevNNeB7ab55kBnqI8nxxfi5grGLRmbGkz9qAjoyaOrQxTvlXGmrbYtpS8cik59flTGGSuzGX root@aiplatform"] |
+
+## Note:
+* `DEFAULT_PRIVATE_KEY` and `DEFAULT_PUBLIC_KEY` are not fully listed for they are too long. Please refer to the examples
+behind for cases.
+* Volcano is not responsible for the validation of `ssh-key-file-path`. So please guarantee it correct yourself.
+* Suggest keeping blank and making use of the default values in most scenarios just like the example behind. If that,
+Volcano will help generate a pair of keys and finish all the configuration by default.
+
+## Examples
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+ name: mpi-job
+spec:
+ minAvailable: 3
+ schedulerName: volcano
+ plugins:
+ ssh: [] ## SSH plugin register
+ svc: []
+ tasks:
+ - replicas: 1
+ name: mpimaster
+ template:
+ spec:
+ containers:
+ - command:
+ - /bin/bash
+ - -c
+ - |
+ mkdir -p /var/run/sshd; /usr/sbin/sshd;
+ MPI_HOST=`cat /etc/volcano/mpiworker.host | tr "\n" ","`;
+ sleep 10;
+ mpiexec --allow-run-as-root --host ${MPI_HOST} -np 2 --prefix /usr/local/openmpi-3.1.5 python /tmp/gpu-test.py;
+ sleep 3600;
+ image: lyd911/mindspore-gpu-example:0.2.0
+ name: mpimaster
+ ports:
+ - containerPort: 22
+ name: mpijob-port
+ workingDir: /home
+ restartPolicy: OnFailure
+ - replicas: 2
+ name: mpiworker
+ template:
+ spec:
+ containers:
+ - command:
+ - /bin/bash
+ - -c
+ - |
+ mkdir -p /var/run/sshd; /usr/sbin/sshd -D;
+ image: lyd911/mindspore-gpu-example:0.2.0
+ name: mpiworker
+ resources:
+ limits:
+ nvidia.com/gpu: "1"
+ ports:
+ - containerPort: 22
+ name: mpijob-port
+ workingDir: /home
+ restartPolicy: OnFailure
+```
+## Note:
+* This example will create a mpi job with 1 `master` and 2 `workers`.
+* Because the `SVC` plugin is enabled, you can get all the hosts in any pod by environment variables. Also, you can get
+the hosts in `/root/.ssh/config` if you use the default ssh configuration.
+```
+[root@mpi-job-master-0 /]# cat /root/.ssh/config
+StrictHostKeyChecking no
+UserKnownHostsFile /dev/null
+Host mpi-job-mpimaster-0
+ HostName mpi-job-mpimaster-0.mpi-job
+Host mpi-job-mpiworker-0
+ HostName mpi-job-mpiworker-0.mpi-job
+Host mpi-job-mpiworker-1
+ HostName mpi-job-mpiworker-1.mpi-job
+```
+* You can sign in other hosts in `master` pod as follows.
+```
+[root@mpi-job-master-0 /]# ssh mpi-job-mpiworker-0
+Warning: Permanently added 'mpi-job-mpiworker-0.mpi-job,X.X.X.X' (ECDSA) to the list of known hosts.
+Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 3.10.0-1160.36.2.el7.x86_64 x86_64)
+
+ * Documentation: https://help.ubuntu.com
+ * Management: https://landscape.canonical.com
+ * Support: https://ubuntu.com/advantage
+
+This system has been minimized by removing packages and content that are
+not required on a system that users do not log into.
+
+To restore this content, you can run the 'unminimize' command.
+Last login: Thu Apr 14 07:19:05 2022 from 10.244.0.67
+root@mpi-job-mpiworker-0:~#
+```
+## Note
+* Please ensure `sshd` service is available in all containers.
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/svc.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/svc.md
new file mode 100644
index 00000000..15145701
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/Plugins/svc.md
@@ -0,0 +1,356 @@
+---
+title: SVC
+---
+
+## Background
+
+**SVC Plugin** is designed for the communication for pods within a volcano job, which is essential for workloads such as
+[TensorFlow](https://tensorflow.google.cn/) and [MPI](https://www.open-mpi.org/). For example, it is necessary for
+`tensorflow job` to contact with each other between `ps` and `worker`. Volcano job plugin `svc` enable pods within a job
+to visit each other by domain.
+
+## Key Points
+
+- Once `svc` plugin is configured, value of field `hostname` under `spec` will be filled out to be **the pod name** for
+ all pods under the job automatically. Namely, `pod.spec.hostname` is `pod.metadata.name`.
+- Once `svc` plugin is configured, value of field `subdomain` under `spec` will be filled out to be **the job name** for
+ all pods under the job automatically. Namely, `pod.spec.subdomain` is `job.metadata.name`.
+- Once `svc` plugin is configured, environment variables `VC_%s_NUM` will be registered to all the containers(including
+ initContainers) under the job automatically. `%s` will be replaced by the **task name** which the pod belongs to. The value
+ of the environment variable is the **task replicas**. The number of the environment variables depends on the number of tasks,
+ which is usually `2` for most AI and Big Data jobs contains 2 roles. For example, a Spark job contains `driver` and `executor`.
+- Once `svc` plugin is configured, environment variables `VC_%s_HOSTS` will be registered to all the containers(including
+ initContainers) under the job automatically. `%s` will be replaced by the **task name** which the pod belongs to. The value
+ of the environment variable are the domains of all the pods under the task. The number of the environment variables depends
+ on the number of tasks, which is usually `2` for most AI and Big Data jobs contains 2 roles. For example, a TensorFlow job
+ contains `ps` and `worker`.
+- A configmap whose name joins job-name and `svc` with `-` will be created automatically, which contains replicas of all
+ tasks and domains of all pods under the task. It will be mounted as a volume for all pods under the job and serves as the
+ host files under the directory `/etc/volcano/`.
+- A headless service whose name is the same with job will be created.
+- If `disable-network-policy` is set to be false, a `NetworkPolicy` object with the type `Ingress` will be created for
+ the job.
+
+## Arguments
+
+| ID | Name | Value | Default Value | Required | Description | Example |
+| --- | ----------------------------- | -------------- | ------------- | -------- | ---------------------------------------------------- | ------------------------------------------- |
+| 1 | `publish-not-ready-addresses` | `true`/`false` | `false` | N | whether publish the pod address when it is not ready | svc: ["--publish-not-ready-addresses=true"] |
+| 2 | `disable-network-policy` | `true`/`false` | `false` | N | whether disable network policy for the job | svc: ["--disable-network-policy=true"] |
+
+## Examples
+
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+ name: tensorflow-dist-mnist
+spec:
+ minAvailable: 3
+ schedulerName: volcano
+ plugins:
+ env: []
+ svc: [
+ "--publish-not-ready-addresses=false",
+ "--disable-network-policy=false",
+ ] ## SVC plugin register
+ policies:
+ - event: PodEvicted
+ action: RestartJob
+ queue: default
+ tasks:
+ - replicas: 1
+ name: ps
+ template:
+ spec:
+ containers:
+ - command:
+ - sh
+ - -c
+ - |
+ PS_HOST=`cat /etc/volcano/ps.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`; ## Get host domain from host files generated
+ WORKER_HOST=`cat /etc/volcano/worker.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`;
+ export TF_CONFIG={\"cluster\":{\"ps\":[${PS_HOST}],\"worker\":[${WORKER_HOST}]},\"task\":{\"type\":\"ps\",\"index\":${VK_TASK_INDEX}},\"environment\":\"cloud\"};
+ python /var/tf_dist_mnist/dist_mnist.py
+ image: volcanosh/dist-mnist-tf-example:0.0.1
+ name: tensorflow
+ ports:
+ - containerPort: 2222
+ name: tfjob-port
+ resources: {}
+ restartPolicy: Never
+ - replicas: 2
+ name: worker
+ policies:
+ - event: TaskCompleted
+ action: CompleteJob
+ template:
+ spec:
+ containers:
+ - command:
+ - sh
+ - -c
+ - |
+ PS_HOST=`cat /etc/volcano/ps.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`;
+ WORKER_HOST=`cat /etc/volcano/worker.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`;
+ export TF_CONFIG={\"cluster\":{\"ps\":[${PS_HOST}],\"worker\":[${WORKER_HOST}]},\"task\":{\"type\":\"worker\",\"index\":${VK_TASK_INDEX}},\"environment\":\"cloud\"};
+ python /var/tf_dist_mnist/dist_mnist.py
+ image: volcanosh/dist-mnist-tf-example:0.0.1
+ name: tensorflow
+ ports:
+ - containerPort: 2222
+ name: tfjob-port
+ resources: {}
+ restartPolicy: Never
+```
+
+## Note:
+
+- Fields `hostname` and `subdomain` have been added to the pods under job `tensorflow-dist-mnist`. The following is part
+ yaml of the `ps` pod.
+
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+ annotations:
+ scheduling.k8s.io/group-name: tensorflow-dist-mnist
+ volcano.sh/job-name: tensorflow-dist-mnist
+ volcano.sh/job-version: "0"
+ volcano.sh/queue-name: default
+ volcano.sh/task-spec: ps
+ volcano.sh/template-uid: tensorflow-dist-mnist-ps
+ labels:
+ volcano.sh/job-name: tensorflow-dist-mnist
+ volcano.sh/job-namespace: default
+ volcano.sh/queue-name: default
+ volcano.sh/task-spec: ps
+ name: tensorflow-dist-mnist-ps-0
+ namespace: default
+ ownerReferences:
+ - apiVersion: batch.volcano.sh/v1alpha1
+ blockOwnerDeletion: true
+ controller: true
+ kind: Job
+ name: tensorflow-dist-mnist
+ uid: 52c98cc2-4791-490f-8572-22df2c16ef8f
+ resourceVersion: "855403"
+ uid: 1b9e834b-de7e-4760-9b23-2a673d38e5d9
+spec:
+ containers:
+ - command:
+ - sh
+ - -c
+ - |
+ PS_HOST=`cat /etc/volcano/ps.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`; ## Get host domain from host files generated
+ WORKER_HOST=`cat /etc/volcano/worker.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`;
+ export TF_CONFIG={\"cluster\":{\"ps\":[${PS_HOST}],\"worker\":[${WORKER_HOST}]},\"task\":{\"type\":\"ps\",\"index\":${VK_TASK_INDEX}},\"environment\":\"cloud\"};
+ python /var/tf_dist_mnist/dist_mnist.py
+ env:
+ - name: VK_TASK_INDEX
+ value: "0"
+ - name: VC_TASK_INDEX
+ value: "0"
+ - name: VC_PS_HOSTS ## Environment variable `VC_PS_HOSTS` contains the domains of all the `ps` hosts.
+ valueFrom:
+ configMapKeyRef:
+ key: VC_PS_HOSTS
+ name: tensorflow-dist-mnist-svc
+ - name: VC_PS_NUM ## Environment variable `VC_PS_NUM` contains the number of `ps` hosts.
+ valueFrom:
+ configMapKeyRef:
+ key: VC_PS_NUM
+ name: tensorflow-dist-mnist-svc
+ - name: VC_WORKER_HOSTS ## Environment variable `VC_WORKER_HOSTS` contains the domains of all the `worker` hosts.
+ valueFrom:
+ configMapKeyRef:
+ key: VC_WORKER_HOSTS
+ name: tensorflow-dist-mnist-svc
+ - name: VC_WORKER_NUM ## Environment variable `VC_WORKER_NUM` contains the number of `worker` hosts.
+ valueFrom:
+ configMapKeyRef:
+ key: VC_WORKER_NUM
+ name: tensorflow-dist-mnist-svc
+ image: volcanosh/dist-mnist-tf-example:0.0.1
+ name: tensorflow
+ ports:
+ - containerPort: 2222
+ name: tfjob-port
+ protocol: TCP
+ resources: {}
+ volumeMounts: ## Mount the configmap generated for the job under `/etc/volcano`, which contains all host files.
+ - mountPath: /etc/volcano
+ name: tensorflow-dist-mnist-svc
+ - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
+ name: kube-api-access-wflz5
+ readOnly: true
+ dnsPolicy: ClusterFirst
+ enableServiceLinks: true
+ hostname: tensorflow-dist-mnist-ps-0 ## Add `hostname` field
+ nodeName: volcano-control-plane
+ restartPolicy: Never
+ schedulerName: volcano
+ subdomain: tensorflow-dist-mnist ## Add `subdomain` field
+ tolerations:
+ - effect: NoExecute
+ key: node.kubernetes.io/not-ready
+ operator: Exists
+ tolerationSeconds: 300
+ - effect: NoExecute
+ key: node.kubernetes.io/unreachable
+ operator: Exists
+ tolerationSeconds: 300
+ volumes:
+ - configMap: ## Configmap generated for the job
+ defaultMode: 420
+ name: tensorflow-dist-mnist-svc
+ name: tensorflow-dist-mnist-svc
+status:
+ conditions:
+ - lastProbeTime: null
+ lastTransitionTime: "2022-04-13T02:08:17Z"
+ status: "True"
+ type: Initialized
+ - lastProbeTime: null
+ lastTransitionTime: "2022-04-13T02:08:18Z"
+ status: "True"
+ type: Ready
+ - lastProbeTime: null
+ lastTransitionTime: "2022-04-13T02:08:18Z"
+ status: "True"
+ type: ContainersReady
+ - lastProbeTime: null
+ lastTransitionTime: "2022-04-13T02:08:17Z"
+ status: "True"
+ type: PodScheduled
+ hostIP: x.x.x.x
+ phase: Running
+ podIP: x.x.x.x
+ podIPs:
+ - ip: x.x.x.x
+ qosClass: BestEffort
+ startTime: "2022-04-13T02:08:17Z"
+```
+
+- Host information is registered to all pods under the job. The following are registered environment variables for `ps` pod.
+
+```
+[root@tensorflow-dist-mnist-ps-0 /] env | grep VC
+VC_PS_NUM=1
+VC_PS_HOSTS=tensorflow-dist-mnist-ps-0.tensorflow-dist-mnist ## ps pod domain
+VC_WORKER_NUM=2
+VC_WORKER_HOSTS=tensorflow-dist-mnist-worker-0.tensorflow-dist-mnist,tensorflow-dist-mnist-worker-1.tensorflow-dist-mnist ## worker pods domains
+```
+
+- The host files added under `/etc/volcano` are as follows.
+
+```
+[root@tensorflow-dist-mnist-ps-0 /] ls /etc/volcano/
+VC_PS_HOSTS VC_PS_NUM VC_WORKER_HOSTS VC_WORKER_NUM ps.host worker.host
+[root@tensorflow-dist-mnist-ps-0 /]# cat /etc/volcano/ps.host
+tensorflow-dist-mnist-ps-0.tensorflow-dist-mnist
+[root@tensorflow-dist-mnist-ps-0 /]# cat /etc/volcano/worker.host
+tensorflow-dist-mnist-worker-0.tensorflow-dist-mnist
+tensorflow-dist-mnist-worker-1.tensorflow-dist-mnist
+```
+
+- The headless service `tensorflow-dist-mnist` generated for the job is as follows.
+
+```yaml
+apiVersion: v1
+kind: Service
+metadata:
+ creationTimestamp: "2022-04-13T02:08:15Z"
+ name: tensorflow-dist-mnist
+ namespace: default
+ ownerReferences:
+ - apiVersion: batch.volcano.sh/v1alpha1
+ blockOwnerDeletion: true
+ controller: true
+ kind: Job
+ name: tensorflow-dist-mnist
+ uid: 52c98cc2-4791-490f-8572-22df2c16ef8f
+ resourceVersion: "855341"
+ uid: a77cb081-72ae-442f-96da-e36974dfed48
+spec:
+ clusterIP: None
+ clusterIPs:
+ - None
+ ipFamilies:
+ - IPv4
+ ipFamilyPolicy: SingleStack
+ selector:
+ volcano.sh/job-name: tensorflow-dist-mnist
+ volcano.sh/job-namespace: default
+ sessionAffinity: None
+ type: ClusterIP
+status:
+ loadBalancer: {}
+```
+
+- The configmap `tensorflow-dist-mnist-svc` generated for the job is as follows.
+
+```yaml
+apiVersion: v1
+data:
+ VC_PS_HOSTS: tensorflow-dist-mnist-ps-0.tensorflow-dist-mnist
+ VC_PS_NUM: "1"
+ VC_WORKER_HOSTS: tensorflow-dist-mnist-worker-0.tensorflow-dist-mnist,tensorflow-dist-mnist-worker-1.tensorflow-dist-mnist
+ VC_WORKER_NUM: "2"
+ ps.host: tensorflow-dist-mnist-ps-0.tensorflow-dist-mnist
+ worker.host: |-
+ tensorflow-dist-mnist-worker-0.tensorflow-dist-mnist
+ tensorflow-dist-mnist-worker-1.tensorflow-dist-mnist
+kind: ConfigMap
+metadata:
+ creationTimestamp: "2022-04-13T02:08:15Z"
+ name: tensorflow-dist-mnist-svc
+ namespace: default
+ ownerReferences:
+ - apiVersion: batch.volcano.sh/v1alpha1
+ blockOwnerDeletion: true
+ controller: true
+ kind: Job
+ name: tensorflow-dist-mnist
+ uid: 52c98cc2-4791-490f-8572-22df2c16ef8f
+ resourceVersion: "855340"
+ uid: c4f3db21-6857-451f-b8b8-bbd5aa8b06ec
+```
+
+- The networkpolicy `tensorflow-dist-mnist` generated for the job is as follows.
+
+```yaml
+apiVersion: networking.k8s.io/v1
+kind: NetworkPolicy
+metadata:
+ creationTimestamp: "2022-04-13T02:08:15Z"
+ name: tensorflow-dist-mnist
+ namespace: default
+ ownerReferences:
+ - apiVersion: batch.volcano.sh/v1alpha1
+ blockOwnerDeletion: true
+ controller: true
+ kind: Job
+ name: tensorflow-dist-mnist
+ uid: 52c98cc2-4791-490f-8572-22df2c16ef8f
+ resourceVersion: "855343"
+ uid: ddf8aada-51d7-47c1-99a0-5e0d8a913a4d
+spec:
+ ingress:
+ - from:
+ - podSelector:
+ matchLabels:
+ volcano.sh/job-name: tensorflow-dist-mnist
+ volcano.sh/job-namespace: default
+ podSelector:
+ matchLabels:
+ volcano.sh/job-name: tensorflow-dist-mnist
+ volcano.sh/job-namespace: default
+ policyTypes:
+ - Ingress
+```
+
+## Note
+
+- DNS plugin is required in your Kubernetes cluster such as `corndns`.
+- Kubernetes version >= v1.14
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/_category_.json b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/_category_.json
new file mode 100644
index 00000000..51e5301d
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Controller/_category_.json
@@ -0,0 +1,4 @@
+{
+ "label": "Controller",
+ "position": 6
+}
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Ecosystem/RayOnVolcano.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Ecosystem/RayOnVolcano.md
index daf7fe82..16cb3ecf 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Ecosystem/RayOnVolcano.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Ecosystem/RayOnVolcano.md
@@ -200,4 +200,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### 了解更多
- 有关KubeRay集成的详细信息,请访问[KubeRay Volcano调度器文档](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/Controller/Plugins/ray)
\ No newline at end of file
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/KeyFeatures/QueueResourceManagement.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/KeyFeatures/QueueResourceManagement.md
index b16fb496..eea677e2 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/KeyFeatures/QueueResourceManagement.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/KeyFeatures/QueueResourceManagement.md
@@ -64,7 +64,7 @@ Volcano中的队列调度涉及以下核心action:
Volcano提供了两个核心的队列调度插件:
#### capacity插件
-[capacity插件](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)支持通过显式配置deserverd值来设置队列资源应得量,如以下队列配置示例:
+[capacity插件](/docs/Scheduler/Plugins/capacity)支持通过显式配置deserverd值来设置队列资源应得量,如以下队列配置示例:
```yaml
apiVersion: scheduling.volcano.sh/v1beta1
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/_category_.json b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/_category_.json
new file mode 100644
index 00000000..d4eb446f
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/_category_.json
@@ -0,0 +1,4 @@
+{
+ "label": "Plugins",
+ "key": "scheduler-plugins"
+}
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/capacity.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/capacity.md
new file mode 100644
index 00000000..9bb70dbf
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/capacity.md
@@ -0,0 +1,197 @@
+---
+title: Capacity
+
+---
+
+
+## Introduction
+
+Capacity plugin is a replacement of proportion plugin, but instead of dividing the queue's deserved resources by weight, it realizes elastic queue capacity management i.e., queue's resource borrowing and lending mechanism by specifying the amount of deserved resources for each dimension resource of the queue.
+
+A queue can use the idle resources of other queues, and when other queues submit jobs, they can reclaim the resources that have been lent, and the amount of reclaimed resources is the amount of queue's deserved resources. For more detail, please see [Capacity scheduling design](https://github.com/volcano-sh/volcano/blob/master/docs/design/capacity-scheduling.md)
+
+## Environment setup
+
+### Install volcano
+
+Refer to [Install Guide](https://github.com/volcano-sh/volcano/blob/master/installer/README.md) to install volcano.
+
+After installed, update the scheduler configuration:
+
+```shell
+kubectl edit cm -n volcano-system volcano-scheduler-configmap
+```
+
+Please make sure
+
+- reclaim action is enabled.
+- capacity plugin is enabled and proportion plugin is removed.
+
+Note: capacity and proportion plugin are in conflict, the two plugins cannot be used together.
+
+```yaml
+kind: ConfigMap
+apiVersion: v1
+metadata:
+ name: volcano-scheduler-configmap
+ namespace: volcano-system
+data:
+ volcano-scheduler.conf: |
+ actions: "enqueue, allocate, backfill, reclaim" # add reclaim action.
+ tiers:
+ - plugins:
+ - name: priority
+ - name: gang
+ enablePreemptable: false
+ - name: conformance
+ - plugins:
+ - name: drf
+ enablePreemptable: false
+ - name: predicates
+ - name: capacity # add this field and remove proportion plugin.
+ - name: nodeorder
+ - name: binpack
+```
+
+## Config queue's deserved resources
+
+Assume there are two nodes and two queues named queue1 and queue2 in your kubernetes cluster, and each node has 4 CPU and 16Gi memory, then there will be total 8 CPU and 32Gi memory in your cluster.
+
+```yaml
+allocatable:
+ cpu: "4"
+ memory: 16Gi
+ pods: "110"
+```
+
+config queue1's deserved field with 2 cpu and 8Gi memory.
+
+```yaml
+apiVersion: scheduling.volcano.sh/v1beta1
+kind: Queue
+metadata:
+ name: queue1
+spec:
+ reclaimable: true
+ deserved: # set the deserved field.
+ cpu: 2
+ memory: 8Gi
+```
+
+config queue2's deserved field with 6 cpu and 24Gi memory.
+
+```yaml
+apiVersion: scheduling.volcano.sh/v1beta1
+kind: Queue
+metadata:
+ name: queue2
+spec:
+ reclaimable: true
+ deserved: # set the deserved field.
+ cpu: 6
+ memory: 24Gi
+```
+
+## Submit pods to each queue
+
+First, submit a deployment named demo-1 to queue1 with replicas=8 and each pod requests 1 cpu and 4Gi memory, because queue2 is idle, so queue1 can use the whole clusters' resources, and you can see that 8 pods are in Running state.
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+ name: demo-1
+spec:
+ selector:
+ matchLabels:
+ app: demo-1
+ replicas: 8
+ template:
+ metadata:
+ labels:
+ app: demo-1
+ annotations:
+ scheduling.volcano.sh/queue-name: "queue1" # set the queue
+ spec:
+ schedulerName: volcano
+ containers:
+ - name: nginx
+ image: nginx:1.14.2
+ resources:
+ requests:
+ cpu: 1
+ memory: 4Gi
+ ports:
+ - containerPort: 80
+```
+
+Expected result:
+
+```shell
+$ kubectl get po
+NAME READY STATUS RESTARTS AGE
+demo-1-7bc649f544-2wjg7 1/1 Running 0 5s
+demo-1-7bc649f544-cvsmr 1/1 Running 0 5s
+demo-1-7bc649f544-j5lzp 1/1 Running 0 5s
+demo-1-7bc649f544-jvlbx 1/1 Running 0 5s
+demo-1-7bc649f544-mzgg2 1/1 Running 0 5s
+demo-1-7bc649f544-ntrs2 1/1 Running 0 5s
+demo-1-7bc649f544-nv424 1/1 Running 0 5s
+demo-1-7bc649f544-zd6d9 1/1 Running 0 5s
+```
+
+Then submit a deployment named demo-2 to queue2 with replicas=8 and each pod requests 1 cpu and 4Gi memory.
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+ name: demo-2
+spec:
+ selector:
+ matchLabels:
+ app: demo-2
+ replicas: 8
+ template:
+ metadata:
+ labels:
+ app: demo-2
+ annotations:
+ scheduling.volcano.sh/queue-name: "queue2" # set the queue
+ spec:
+ schedulerName: volcano
+ containers:
+ - name: nginx
+ image: nginx:1.14.2
+ resources:
+ requests:
+ cpu: 1
+ memory: 4Gi
+ ports:
+ - containerPort: 80
+```
+
+Because queue1 occupied queue2's resources, so queue2 will reclaim its deserved resources with 6 cpu and 24Gi memory. And each pod of demo-2 request 1 cpu and 4Gi memory, so there will be 6 Pods in Running state of demo-2, and demo-1's pods will be evicted.
+
+Finally, you can see that there are 2 Running pods in demo-1(belongs to queue1), and 6 Running pods in demo-2(belongs to queue2), which meets queue's deserved resources respectively.
+
+```shell
+$ kubectl get po
+NAME READY STATUS RESTARTS AGE
+demo-1-7bc649f544-4vvdv 0/1 Pending 0 37s
+demo-1-7bc649f544-c6mds 0/1 Pending 0 37s
+demo-1-7bc649f544-j5lzp 1/1 Running 0 14m
+demo-1-7bc649f544-mzgg2 1/1 Running 0 14m
+demo-1-7bc649f544-pqdgk 0/1 Pending 0 37s
+demo-1-7bc649f544-tx6wp 0/1 Pending 0 37s
+demo-1-7bc649f544-wmshq 0/1 Pending 0 37s
+demo-1-7bc649f544-wrhrr 0/1 Pending 0 37s
+demo-2-6dfb86c49b-2jvgm 0/1 Pending 0 37s
+demo-2-6dfb86c49b-dnjzv 1/1 Running 0 37s
+demo-2-6dfb86c49b-fzvmp 1/1 Running 0 37s
+demo-2-6dfb86c49b-jlf69 1/1 Running 0 37s
+demo-2-6dfb86c49b-k62f7 1/1 Running 0 37s
+demo-2-6dfb86c49b-k9b9v 1/1 Running 0 37s
+demo-2-6dfb86c49b-rpzvg 0/1 Pending 0 37s
+demo-2-6dfb86c49b-zch7w 1/1 Running 0 37s
+```
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/cdp.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/cdp.md
new file mode 100644
index 00000000..2e0ea628
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/cdp.md
@@ -0,0 +1,205 @@
+---
+title: CDP
+
+---
+
+
+## Background
+When we need to enable elastic training or serving, preemptible job's pods can be preempted or back to running repeatedly, if no cooldown protection set, these pods can be preempted again after they just started for a short time, this may cause service stability dropped.
+So we add "cdp" plugin to ensure preemptible job's pods can run for at least some time set by user.
+
+## Environment setup
+
+### Install volcano
+
+Refer to [Install Guide](https://github.com/volcano-sh/volcano/blob/master/installer/README.md) to install volcano.
+
+### Update scheduler configmap
+
+After installed, update the scheduler configuration:
+
+```shell
+kubectl edit configmap -n volcano-system volcano-scheduler-configmap
+```
+
+Register `cdp` plugin in configmap while enable `preempt` action
+
+```yaml
+kind: ConfigMap
+apiVersion: v1
+metadata:
+ name: volcano-scheduler-configmap
+ namespace: volcano-system
+data:
+ volcano-scheduler.conf: |
+ actions: "enqueue, allocate, preempt, backfill"
+ tiers:
+ - plugins:
+ - name: priority
+ - name: gang
+ - name: conformance
+ - name: cdp
+ - plugins:
+ - name: drf
+ - name: predicates
+ - name: task-topology
+ arguments:
+ task-topology.weight: 10
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+```
+
+### Running Jobs
+
+Take a simple volcano job as sample.
+
+original job yaml is as below, which has "ps" and "worker" task
+
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+ name: test-job
+spec:
+ minAvailable: 3
+ schedulerName: volcano
+ priorityClassName: high-priority
+ plugins:
+ ssh: []
+ env: []
+ svc: []
+ maxRetry: 5
+ queue: default
+ volumes:
+ - mountPath: "/myinput"
+ - mountPath: "/myoutput"
+ volumeClaimName: "testvolumeclaimname"
+ volumeClaim:
+ accessModes: [ "ReadWriteOnce" ]
+ storageClassName: "my-storage-class"
+ resources:
+ requests:
+ storage: 1Gi
+ tasks:
+ - replicas: 6
+ name: "worker"
+ template:
+ metadata:
+ name: worker
+ spec:
+ containers:
+ - image: nginx
+ imagePullPolicy: IfNotPresent
+ name: nginx
+ resources:
+ requests:
+ cpu: "1"
+ restartPolicy: OnFailure
+ - replicas: 2
+ name: "ps"
+ template:
+ metadata:
+ name: ps
+ spec:
+ containers:
+ - image: nginx
+ imagePullPolicy: IfNotPresent
+ name: nginx
+ resources:
+ requests:
+ cpu: "1"
+ restartPolicy: OnFailure
+
+```
+
+#### Edit yaml of vcjob
+
+1. add annotations in volcano job in format below.
+ 1. `volcano.sh/preemptable` annotation indicates that job or task is preemptable
+ 2. `volcano.sh/cooldown-time` annotation indicates cooldown time for the entire job or dedicated task. Value for the annotation indicates cooldown time, valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
+
+ ```yaml
+ volcano.sh/preemptable: "true"
+ volcano.sh/cooldown-time: "600s"
+ ```
+
+**Example 1**
+
+Add annotation to entire job, then "ps" and "worker" task can be preempted and all have cooldown time support.
+
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+ name: test-job
+ annotations:
+ volcano.sh/preemptable: "true"
+ volcano.sh/cooldown-time: "600s"
+spec:
+ ... # below keep the same
+```
+
+**Example 2**
+
+Add annotation to dedicated task, as shown below, only "worker" can be preempted and have cooldown time support.
+
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+ name: test-job
+spec:
+ minAvailable: 3
+ schedulerName: volcano
+ priorityClassName: high-priority
+ plugins:
+ ssh: []
+ env: []
+ svc: []
+ maxRetry: 5
+ queue: default
+ volumes:
+ - mountPath: "/myinput"
+ - mountPath: "/myoutput"
+ volumeClaimName: "testvolumeclaimname"
+ volumeClaim:
+ accessModes: [ "ReadWriteOnce" ]
+ storageClassName: "my-storage-class"
+ resources:
+ requests:
+ storage: 1Gi
+ tasks:
+ - replicas: 6
+ name: "worker"
+ template:
+ metadata:
+ name: worker
+ annotations: # add annotation in tasks
+ volcano.sh/preemptable: "true"
+ volcano.sh/cooldown-time: "600s"
+ spec:
+ containers:
+ - image: nginx
+ imagePullPolicy: IfNotPresent
+ name: nginx
+ resources:
+ requests:
+ cpu: "1"
+ restartPolicy: OnFailure
+ - replicas: 2
+ name: "ps"
+ template:
+ metadata:
+ name: ps
+ spec:
+ containers:
+ - image: nginx
+ imagePullPolicy: IfNotPresent
+ name: nginx
+ resources:
+ requests:
+ cpu: "1"
+ restartPolicy: OnFailure
+
+```
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/deviceshare.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/deviceshare.md
new file mode 100644
index 00000000..4448f6bc
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/deviceshare.md
@@ -0,0 +1,41 @@
+---
+title: DeviceShare
+---
+
+## Introduction
+
+The **DeviceShare Plugin** is an advanced resource scheduling plugin in Volcano that provides a common framework for sharing specialized hardware devices (like GPUs, NPUs, FPGAs) across multiple pods.
+
+Rather than implementing fragmented logic for each new hardware accelerator, Volcano exposes a unified `Devices` interface. The `deviceshare` plugin leverages this interface to perform robust allocation, node filtering, and resource tracking for shared devices.
+
+## Mechanism
+
+The DeviceShare plugin works in conjunction with device-specific implementations. It exposes standard scheduling operations such as `Predicate` (filtering nodes based on available device capacity) and `Allocate`/`Release` (assigning portions of a device to specific pods).
+
+Currently, the `deviceshare` plugin serves as the underlying engine powering features like:
+- **GPU Sharing**: Allowing multiple pods to request fractions of a single physical GPU's memory.
+- **vGPU and vNPU**: Virtualizing accelerator slices.
+- **GPU Exclusive**: Restricting a pod to exclusively own a GPU to avoid contention.
+
+## Configuration and Usage
+
+The `deviceshare` plugin is typically enabled implicitly when you enable device sharing predicates in the Volcano scheduler config map. However, if you are developing custom device sharing logic or need to explicitly declare it, it can be configured in your `volcano-scheduler-configmap`:
+
+```yaml
+actions: "enqueue, allocate, backfill"
+tiers:
+ - plugins:
+ - name: priority
+ - name: gang
+ - name: conformance
+ - name: deviceshare # Enable the device share framework plugin
+ - plugins:
+ - name: overcommit
+ - name: drf
+ - name: predicates
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+```
+
+> **Note:** For specific guides on how to configure your workloads to request shared GPUs or NPUs, please refer to the dedicated guides for [GPU Sharing](../../UserGuide/user_guide_how_to_use_gpu_sharing) and [vNPU](../../UserGuide/user_guide_how_to_use_vnpu).
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/nodegroup.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/nodegroup.md
new file mode 100644
index 00000000..c9495b4d
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/nodegroup.md
@@ -0,0 +1,444 @@
+---
+title: Nodegroup
+
+---
+
+## Introduction
+
+**Nodegroup plugin** is designed to isolate resources by assigning labels to nodes and set node label affinty on Queue.
+
+## Usage
+
+### assign label to node
+
+Assign label to node, label key is `volcano.sh/nodegroup-name`.
+
+```shell script
+kubectl label nodes volcano.sh/nodegroup-name=
+```
+
+### configure queue
+
+Create queue and bind nodegroup to it.
+
+```yaml
+apiVersion: scheduling.volcano.sh/v1beta1
+kind: Queue
+metadata:
+ name: default
+ spec:
+ reclaimable: true
+ weight: 1
+ affinity: # added field
+ nodeGroupAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ -
+ preferredDuringSchedulingIgnoredDuringExecution:
+ -
+ nodeGroupAntiAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ -
+ preferredDuringSchedulingIgnoredDuringExecution:
+ -
+```
+
+### submit a vcjob
+
+submit vcjob job-1 to default queue.
+
+```shell script
+$ cat < -o yaml
+# Verify parent hierarchy and affinity inheritance
+```
+
+2. Override restrictive inheritance:
+
+```yaml
+apiVersion: scheduling.volcano.sh/v1beta1
+kind: Queue
+metadata:
+ name: flexible-team
+spec:
+ parent: restrictive-parent
+ affinity:
+ nodeGroupAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ - flexible-nodes
+```
+
+### Issue: Unexpected affinity behavior
+
+**Symptoms:**
+
+- Tasks scheduled on unexpected nodegroups
+- Inheritance not working as expected
+
+**Debugging Steps:**
+
+1. Enable detailed logging:
+
+```bash
+# Check scheduler logs with higher verbosity
+kubectl logs -n volcano-system volcano-scheduler-xxx --tail=100 | grep -i nodegroup
+```
+
+2. Verify queue configuration:
+
+```bash
+# List all queues with their hierarchy
+kubectl get queues -o custom-columns=NAME:.metadata.name,PARENT:.spec.parent,WEIGHT:.spec.weight
+```
+
+### Issue: Circular queue dependencies
+
+**Symptoms:**
+
+- Queues not building ancestry correctly
+- Warning messages about cycle detection
+
+**Solutions:**
+
+1. Review queue hierarchy:
+
+```bash
+# Check for circular references
+kubectl get queues -o custom-columns=NAME:.metadata.name,PARENT:.spec.parent
+```
+
+2. Fix circular dependencies by updating parent references:
+
+```yaml
+# Remove or correct circular parent relationships
+apiVersion: scheduling.volcano.sh/v1beta1
+kind: Queue
+metadata:
+ name: problematic-queue
+spec:
+ parent: "" # Remove circular reference
+ weight: 1
+```
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/overcommit.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/overcommit.md
new file mode 100644
index 00000000..aad3d549
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/overcommit.md
@@ -0,0 +1,44 @@
+---
+title: Overcommit
+---
+
+## Introduction
+
+In typical cluster environments, the scheduler calculates available idle resources strictly based on physical node capacity minus allocated resources. However, when cluster resources are nearly fully utilized, many PodGroups are rejected from entering the scheduling pipeline and are left completely un-enqueued, which might not be desirable for scenarios where you want the scheduler to tolerate a larger backlog of `pending` pods.
+
+The **Overcommit Plugin** allows the scheduler to artificially inflate the apparent "idle resources" of the cluster by applying an `overcommit-factor`. This permits more jobs to be enqueued and wait in the scheduling pipeline than the physical resources might typically allow.
+
+## Mechanism
+
+The Overcommit plugin evaluates whether a job can be enqueued based on the requested `MinResources` of the PodGroup and the expanded idle resources.
+
+Expanded idle resource is calculated as:
+`Idle Resource = (Total Resource * overcommit-factor) - Used Resource`
+
+If the job's minimal requested resources can fit into this expanded idle resource pool, the job is permitted to be enqueued.
+
+## Configuration
+
+To use the Overcommit Plugin, add it to your `volcano-scheduler-configmap` under the `enqueue` tier, and provide an `overcommit-factor`.
+
+```yaml
+actions: "enqueue, allocate, backfill"
+tiers:
+ - plugins:
+ - name: overcommit # Enable the overcommit plugin
+ arguments:
+ overcommit-factor: 1.2 # The overcommit factor. Default is 1.2
+ - name: priority
+ - name: gang
+ - name: conformance
+ - plugins:
+ - name: drf
+ - name: predicates
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+```
+
+### Arguments
+
+- **`overcommit-factor`**: A float value greater than or equal to `1.0`. For example, `1.2` means the scheduler will pretend the cluster has 20% more total resources when deciding whether to enqueue jobs into the pipeline. If a value less than `1.0` is provided, the plugin will automatically fallback to the default value of `1.2`.
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/pdb.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/pdb.md
new file mode 100644
index 00000000..f409308c
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/pdb.md
@@ -0,0 +1,43 @@
+---
+title: PDB
+---
+
+## Introduction
+
+When users deploy highly available jobs or applications on Volcano, they often need to limit the number of pod replicas that can be evicted or destroyed simultaneously to avoid downtime. This constraint is managed via Kubernetes **PodDisruptionBudget (PDB)** resources.
+
+The **PDB Plugin** ensures that Volcano respects user-defined PDB constraints during the scheduling process, specifically during eviction actions like `reclaim`, `preempt`, and `shuffle`.
+
+## Prerequisites
+
+- Your Kubernetes version must be 1.21 or later.
+- You must have created valid `PodDisruptionBudget` resources for your workloads.
+
+## Mechanism
+
+The PDB Plugin registers several functions (`ReclaimableFn`, `PreemptableFn`, and `VictimTasksFn`) under the `reclaim`, `preempt`, and `shuffle` actions. It maintains a cache of PDBs using `v1.PodDisruptionBudgetLister`.
+
+During eviction scenarios, the plugin filters out tasks whose eviction would violate the configured PDB constraints. It calculates the `DisruptedPods` (pods whose eviction was processed but not yet observed by the PDB controller) and ensures the remaining available replicas satisfy the budget.
+
+## Configuration
+
+To enable the PDB Plugin, update the `volcano-scheduler-configmap` to include the `pdb` plugin in your configuration tiers.
+
+```yaml
+actions: "reclaim, preempt, shuffle"
+tiers:
+- plugins:
+ - name: pdb # Enable the PDB plugin
+ - name: priority
+ - name: gang
+ - name: conformance
+- plugins:
+ - name: overcommit
+ - name: drf
+ - name: predicates
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+```
+
+*Note: The PDB plugin will be actively invoked when actions like `reclaim`, `preempt`, or `shuffle` are executed in the scheduler workflow.*
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/rescheduling.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/rescheduling.md
new file mode 100644
index 00000000..a4c2e541
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/rescheduling.md
@@ -0,0 +1,72 @@
+---
+title: Rescheduling
+---
+
+## Introduction
+
+Unbalanced resource utilization across a Kubernetes cluster often occurs due to unreasonable scheduling strategies, dynamic changes in job lifecycles, and node status changes (such as added/removed nodes or taint/affinity modifications).
+
+The **Rescheduling** plugin addresses these issues by actively rebalancing the cluster's resource utilization among nodes. It accomplishes this by evaluating real resource utilization (via Prometheus metrics) instead of merely the requested resource amounts, and it periodically evicts pods based on custom configured rescheduling strategies.
+
+## Rescheduling Workflow
+
+1. **Resource Filter**: Filters workloads which are eligible to be evicted based on queues or labels.
+2. **Strategy Evaluation**: Evaluates filtered workloads against the configured rescheduling strategies to determine which ones should be evicted.
+3. **Eviction**: Evicts the pods attached to the identified workloads.
+4. **Periodical Execution**: Executes the above process periodically.
+
+## Rescheduling Strategies
+
+Volcano's rescheduling plugin supports multiple strategies to select potential evictees:
+
+- **LowNodeUtilization**: Targets unbalanced nodes by evicting pods from highly utilized nodes and shuffling them to low utilized nodes based on configured target thresholds.
+- **OfflineOnly (OLO)**: Only selects offline workloads (annotated with `preemptable: true`) for rescheduling.
+- **LowPriorityFirst (LPF)**: Sorts workloads by priority and evicts lower priority pods first.
+- **ShortLifeTimeFirst (SLTF)**: Sorts workloads by running time. Pods with the shortest life time will be rescheduled first to ensure long-running workloads are not interrupted.
+- **BigObjectFirst (BOF)**: Selects workloads which request the most dominant resource and reschedules them first to improve system throughput and avoid small workloads starvation.
+- **MoreReplicasFirst (MRF)**: Sorts workloads by replica number. Workloads with the most replicas are rescheduled first, making it friendly to `gang` scheduling by considering `minAvailable`.
+
+## Configuration
+
+To enable the Rescheduling plugin, you must configure the `volcano-scheduler-configmap` by adding the `shuffle` action and configuring the `rescheduling` plugin within the tiers.
+
+```yaml
+actions: "enqueue, allocate, backfill, shuffle" ## Add 'shuffle' action
+tiers:
+ - plugins:
+ - name: priority
+ - name: gang
+ - name: conformance
+ - name: rescheduling ## Rescheduling plugin
+ arguments:
+ interval: 5m ## Optional. Frequency at which the strategies are called. Default is 5m.
+ metricsPeriod: 5m ## Optional. The duration of metrics to consider. Default is 5m.
+ strategies: ## Required. Strategies to execute in order.
+ - name: offlineOnly
+ - name: lowPriorityFirst
+ - name: lowNodeUtilization
+ params:
+ thresholds:
+ "cpu" : 20 ## Threshold below which a node is considered under-utilized
+ "memory": 20
+ "pods": 20
+ targetThresholds:
+ "cpu" : 50 ## Target utilization to reach for balance
+ "memory": 50
+ "pods": 50
+ queueSelector: ## Optional. Select workloads in specified queues as potential evictees. All queues by default.
+ - default
+ - test-queue
+ labelSelector: ## Optional. Select workloads with specified labels as potential evictees. All labels by default.
+ business: offline
+ team: test
+ - plugins:
+ - name: overcommit
+ - name: drf
+ - name: predicates
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+```
+
+> **Note:** The rescheduling decisions consider metrics collected from Prometheus. Ensure your metrics configuration is correctly set up as it evaluates real node resource utilization instead of requested resource amounts.
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/resource_strategy_fit.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/resource_strategy_fit.md
new file mode 100644
index 00000000..0eced3c6
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/resource_strategy_fit.md
@@ -0,0 +1,376 @@
+---
+title: Resource Strategy Fit
+
+---
+
+
+## Introduction
+
+The **Resource Strategy Fit Plugin** is a Volcano scheduler plugin that provides intelligent resource allocation strategies for pod scheduling. It supports both global configuration and pod-level annotations to optimize resource utilization across different workloads.
+
+## Key Features
+
+- **Multiple Scoring Strategies**: Supports `LeastAllocated` and `MostAllocated` strategies
+- **Resource-Specific Configuration**: Configure different strategies for different resource types (CPU, Memory, GPU, etc.)
+- **Pod-Level Override**: Allow individual pods to override global configuration via annotations
+- **Weighted Scoring**: Fine-tune resource importance with configurable weights
+- **Wildcard Support**: Use wildcard patterns for resource matching
+
+## Installation
+
+### 1. Install Volcano
+
+Refer to [Install Guide](https://github.com/volcano-sh/volcano/blob/master/installer/README.md) to install Volcano.
+
+### 2. Configure the Plugin
+
+Update the Volcano scheduler configuration:
+
+```bash
+kubectl edit cm -n volcano-system volcano-scheduler-configmap
+```
+
+Add the `resource-strategy-fit` plugin to your configuration:
+
+```yaml
+kind: ConfigMap
+apiVersion: v1
+metadata:
+ name: volcano-scheduler-configmap
+ namespace: volcano-system
+data:
+ volcano-scheduler.conf: |
+ actions: "reclaim, allocate, backfill, preempt"
+ tiers:
+ - plugins:
+ - name: priority
+ - name: gang
+ - name: conformance
+ - plugins:
+ - name: drf
+ - name: predicates
+ - name: resource-strategy-fit
+ arguments:
+ resourceStrategyFitWeight: 10
+ resources:
+ cpu:
+ type: "LeastAllocated"
+ weight: 1
+ memory:
+ type: "LeastAllocated"
+ weight: 1
+ nvidia.com/gpu:
+ type: "MostAllocated"
+ weight: 2
+```
+
+## Global Configuration
+
+### Basic Configuration
+
+The plugin supports two main scoring strategies:
+
+| Strategy | Description | Use Case |
+|----------|-------------|----------|
+| `LeastAllocated` | Prefers nodes with more available resources | General workloads, load balancing |
+| `MostAllocated` | Prefers nodes with higher resource utilization | GPU workloads, resource consolidation |
+
+### Configuration Parameters
+
+```yaml
+arguments:
+ resourceStrategyFitWeight: 10 # Plugin weight (default: 10)
+ resources: # Resource-specific configuration
+ cpu: # Resource name
+ type: "LeastAllocated" # Scoring strategy
+ weight: 1 # Resource weight
+ memory:
+ type: "LeastAllocated"
+ weight: 1
+ nvidia.com/gpu:
+ type: "MostAllocated"
+ weight: 2
+```
+
+### Advanced Configuration Examples
+
+#### 1. GPU-Optimized Configuration
+
+```yaml
+arguments:
+ resourceStrategyFitWeight: 20
+ resources:
+ cpu:
+ type: "LeastAllocated"
+ weight: 1
+ memory:
+ type: "LeastAllocated"
+ weight: 1
+ nvidia.com/gpu:
+ type: "MostAllocated"
+ weight: 5
+ nvidia.com/gpu/*: # Wildcard for all GPU types
+ type: "MostAllocated"
+ weight: 3
+```
+
+#### 2. Mixed Strategy Configuration
+
+```yaml
+arguments:
+ resourceStrategyFitWeight: 15
+ resources:
+ cpu:
+ type: "LeastAllocated"
+ weight: 3
+ memory:
+ type: "MostAllocated"
+ weight: 1
+ example.com/custom-resource:
+ type: "LeastAllocated"
+ weight: 2
+```
+
+## Pod-Level Configuration
+
+### Pod Annotations
+
+Individual pods can override the global configuration using annotations:
+
+| Annotation Key | Description | Example |
+|----------------|-------------|---------|
+| `volcano.sh/resource-strategy-scoring-type` | Override scoring strategy | `"LeastAllocated"` or `"MostAllocated"` |
+| `volcano.sh/resource-strategy-weight` | Override resource weights | `{"cpu": 2, "memory": 1, "nvidia.com/gpu": 3}` |
+
+### Pod-Level Examples
+
+#### 1. Override Strategy for Specific Pod
+
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+ name: gpu-workload
+ annotations:
+ volcano.sh/resource-strategy-scoring-type: "MostAllocated"
+ volcano.sh/resource-strategy-weight: '{"nvidia.com/gpu": 5, "cpu": 1}'
+spec:
+ containers:
+ - name: gpu-container
+ image: nvidia/cuda:11.0-runtime
+ resources:
+ requests:
+ nvidia.com/gpu: 1
+ cpu: "2"
+ memory: "4Gi"
+ limits:
+ nvidia.com/gpu: 1
+ cpu: "2"
+ memory: "4Gi"
+ schedulerName: volcano
+```
+
+#### 2. Custom Resource Weights
+
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+ name: custom-resource-pod
+ annotations:
+ volcano.sh/resource-strategy-scoring-type: "LeastAllocated"
+ volcano.sh/resource-strategy-weight: '{"cpu": 3, "memory": 2, "example.com/custom": 5}'
+spec:
+ containers:
+ - name: app
+ image: my-app:latest
+ resources:
+ requests:
+ cpu: "1"
+ memory: "2Gi"
+ example.com/custom: "1"
+ schedulerName: volcano
+```
+
+## Volcano Job Integration
+
+### Basic Volcano Job
+
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+ name: resource-strategy-job
+spec:
+ minAvailable: 2
+ schedulerName: volcano
+ plugins:
+ env: []
+ svc: []
+ tasks:
+ - replicas: 2
+ name: worker
+ template:
+ metadata:
+ annotations:
+ volcano.sh/resource-strategy-scoring-type: "LeastAllocated"
+ volcano.sh/resource-strategy-weight: '{"cpu": 2, "memory": 1}'
+ spec:
+ containers:
+ - name: worker
+ image: my-worker:latest
+ resources:
+ requests:
+ cpu: "2"
+ memory: "4Gi"
+ limits:
+ cpu: "2"
+ memory: "4Gi"
+ restartPolicy: Never
+```
+
+### Multi-Task Job with Different Strategies
+
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+ name: mixed-strategy-job
+spec:
+ minAvailable: 3
+ schedulerName: volcano
+ plugins:
+ env: []
+ svc: []
+ tasks:
+ - replicas: 1
+ name: gpu-task
+ template:
+ metadata:
+ annotations:
+ volcano.sh/resource-strategy-scoring-type: "MostAllocated"
+ volcano.sh/resource-strategy-weight: '{"nvidia.com/gpu": 5, "cpu": 1}'
+ spec:
+ containers:
+ - name: gpu-worker
+ image: gpu-app:latest
+ resources:
+ requests:
+ nvidia.com/gpu: 1
+ cpu: "1"
+ memory: "2Gi"
+ - replicas: 2
+ name: cpu-task
+ template:
+ metadata:
+ annotations:
+ volcano.sh/resource-strategy-scoring-type: "LeastAllocated"
+ volcano.sh/resource-strategy-weight: '{"cpu": 3, "memory": 2}'
+ spec:
+ containers:
+ - name: cpu-worker
+ image: cpu-app:latest
+ resources:
+ requests:
+ cpu: "2"
+ memory: "4Gi"
+```
+
+## Use Cases
+
+### 1. GPU Workload Optimization
+
+For GPU-intensive workloads, use `MostAllocated` strategy to consolidate GPU usage:
+
+```yaml
+# Global configuration
+arguments:
+ resourceStrategyFitWeight: 20
+ resources:
+ nvidia.com/gpu:
+ type: "MostAllocated"
+ weight: 5
+ cpu:
+ type: "LeastAllocated"
+ weight: 1
+```
+
+### 2. Load Balancing
+
+For general workloads, use `LeastAllocated` strategy to distribute load evenly:
+
+```yaml
+# Global configuration
+arguments:
+ resourceStrategyFitWeight: 10
+ resources:
+ cpu:
+ type: "LeastAllocated"
+ weight: 2
+ memory:
+ type: "LeastAllocated"
+ weight: 1
+```
+
+### 3. Mixed Workloads
+
+Combine different strategies for different resource types:
+
+```yaml
+# Global configuration
+arguments:
+ resourceStrategyFitWeight: 15
+ resources:
+ cpu:
+ type: "LeastAllocated"
+ weight: 3
+ memory:
+ type: "LeastAllocated"
+ weight: 2
+ nvidia.com/gpu:
+ type: "MostAllocated"
+ weight: 5
+```
+
+## Troubleshooting
+
+### Verify Plugin Configuration
+
+Check the scheduler logs to ensure the plugin is loaded correctly:
+
+```bash
+kubectl logs -n volcano-system deployment/volcano-scheduler | grep "resource-strategy-fit"
+```
+
+Expected output:
+```
+Initialize resource-strategy-fit plugin with configuration: {resourceStrategyFitWeight: 10, resources: {...}}
+```
+
+### Common Issues
+
+1. **Plugin not loaded**: Ensure the plugin is included in the scheduler configuration
+2. **Invalid annotations**: Check JSON format for pod-level weight annotations
+3. **Resource not found**: Verify resource names match exactly (case-sensitive)
+4. **Scoring not working**: Check plugin weight and resource weights are properly configured
+
+### Debug Information
+
+Enable debug logging to see scoring decisions:
+
+```yaml
+# Add to scheduler configuration
+arguments:
+ resourceStrategyFitWeight: 10
+ # ... other configuration
+ logLevel: 4 # Enable debug logging
+```
+
+## Best Practices
+
+1. **Start with global configuration** for consistent behavior across all workloads
+2. **Use pod-level annotations sparingly** for specific workload requirements
+3. **Test different strategies** to find optimal configuration for your use case
+4. **Monitor resource utilization** after applying the plugin
+5. **Use appropriate weights** to balance different resource types
+6. **Consider workload characteristics** when choosing between `LeastAllocated` and `MostAllocated`
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/resourcequota.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/resourcequota.md
new file mode 100644
index 00000000..1a9f8411
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/resourcequota.md
@@ -0,0 +1,40 @@
+---
+title: ResourceQuota
+---
+
+## Introduction
+
+In a multi-tenant cluster, administrators commonly use Kubernetes `ResourceQuota` to limit the total resource consumption per namespace. However, by default, the scheduler may continuously try to enqueue and schedule pod groups even if the namespace's `ResourceQuota` is insufficient, leading to pending jobs and scheduler overhead.
+
+The **ResourceQuota Plugin** solves this problem by ensuring that a PodGroup is only allowed to enqueue if there is sufficient resource capacity in the namespace to satisfy the minimal resource quota required by the PodGroup.
+
+## Mechanism
+
+Volcano implements the ResourceQuota plugin using the `AddJobEnqueueableFn` function.
+
+1. **Namespace Capacity Cache**: The plugin maintains an `RQStatus` map to cache all resource quotas for each namespace.
+2. **Evaluation**: It calculates the `minQuotas`—which defines the minimal resource quota required to run the pod group.
+3. **Enqueue Admission**: When evaluating pending PodGroups, the plugin allows them to enqueue *only* if the namespace's Kubernetes `ResourceQuota` has enough available capacity. It also considers PodGroups that have already been permitted in the current scheduling round to prevent race conditions that could exceed the namespace quota.
+
+## Configuration
+
+To enable the ResourceQuota Plugin, add it to your `volcano-scheduler-configmap`.
+
+```yaml
+actions: "enqueue, allocate, backfill"
+tiers:
+ - plugins:
+ - name: priority
+ - name: gang
+ - name: conformance
+ - name: resourcequota # Enable the ResourceQuota plugin
+ - plugins:
+ - name: overcommit
+ - name: drf
+ - name: predicates
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+```
+
+Once enabled, Volcano will respect the Kubernetes native `ResourceQuota` configuration and prevent jobs from enqueuing into the scheduling pipeline if they cannot possibly fit within their namespace quota.
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/task_topology.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/task_topology.md
new file mode 100644
index 00000000..58f02186
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/task_topology.md
@@ -0,0 +1,66 @@
+---
+title: Task Topology
+---
+
+
+## Environment setup
+
+### Install volcano
+
+Refer to [Install Guide](https://github.com/volcano-sh/volcano/blob/master/installer/README.md) to install volcano.
+
+### Update scheduler configmap
+
+After installed, update the scheduler configuration:
+
+```shell
+kubectl edit configmap -n volcano-system volcano-scheduler-configmap
+```
+
+Register `task-topology` plugin in configmap
+
+```yaml
+kind: ConfigMap
+apiVersion: v1
+metadata:
+ name: volcano-scheduler-configmap
+ namespace: volcano-system
+data:
+ volcano-scheduler.conf: |
+ actions: "enqueue, allocate, backfill"
+ tiers:
+ - plugins:
+ - name: priority
+ - name: gang
+ - name: conformance
+ - plugins:
+ - name: drf
+ - name: predicates
+ - name: task-topology
+ arguments:
+ task-topology.weight: 10
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+```
+
+### Running Jobs
+
+Take tensorflow job as sample.
+
+#### Install kubeflow/tf-operator
+
+Refer to [Install Guide](https://www.kubeflow.org/docs/started/getting-started/) to install kubeflow, tf-operator included.
+
+#### Edit yaml of tfjob
+
+1. add annotations in volcano job or tensorflow job in format below.
+ 1. `affinity` annotation indicates that tasks have connections between each other, so they should be set on same nodes;
+ 2. `anti-affinity` annotation indicates that tasks do not have connections between each other, so they should be set on different nodes;
+ 3. `task-order` annotation indicates the order that tasks should be allocated. For example, `ps,worker` means scheduler should schedule `ps` tasks first. After all `ps` tasks were allocated, scheduler started to schedule `worker` tasks. **This annotation is not a required field.**
+
+ ```yaml
+ volcano.sh/task-topology-affinity: "ps,worker;ps,evaluator"
+ volcano.sh/task-topology-anti-affinity: "ps;worker,chief;chief,evaluator"
+ volcano.sh/task-topology-task-order: "ps,worker,chief,evaluator"
+ ```
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/usage.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/usage.md
new file mode 100644
index 00000000..9b7c1cf9
--- /dev/null
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/Scheduler/Plugins/usage.md
@@ -0,0 +1,103 @@
+---
+title: Usage-based Scheduling
+---
+
+## Introduction
+
+Currently, Kubernetes pod scheduling is based on resource requests and node allocatable resources rather than the actual node usage. This can lead to unbalanced resource usage across compute nodes, where pods might be scheduled to nodes with higher actual usage and lower allocation rates.
+
+The **Usage-based scheduling** plugin allows Volcano to schedule pods based on real-time node usage. This helps to balance the cluster's actual resource usage and avoid scheduling pods on overloaded nodes.
+
+## Mechanism
+
+The Usage-based scheduling plugin primarily performs three functions:
+1. **Predicate**: Filters nodes whose actual resource usage (e.g., CPU, Memory) is higher than the usage threshold defined by the user.
+2. **Prioritize (NodeOrder)**: Prioritizes nodes with lower real-time usage, ensuring that nodes with more idle resources get higher scores.
+3. **Preempt**: Pods on nodes with lower usage can preempt pods on nodes with higher usage to balance the cluster load.
+
+Volcano retrieves the node usage monitoring data from a metrics server (such as Prometheus, Prometheus Adaptor, or Elasticsearch).
+
+## Configuration
+
+To enable the Usage-based scheduling plugin, configure the `volcano-scheduler-configmap`.
+
+```yaml
+actions: "enqueue, allocate, backfill"
+tiers:
+ - plugins:
+ - name: priority
+ - name: gang
+ - name: conformance
+ - name: usage # usage based scheduling plugin
+ enablePredicate: false # If false, new pod scheduling is not disabled when node load reaches the threshold. If true or empty, new pod scheduling is disabled.
+ arguments:
+ usage.weight: 5
+ cpu.weight: 1
+ memory.weight: 1
+ thresholds:
+ cpu: 80 # The node cannot schedule new pods if its actual CPU load reaches 80%.
+ mem: 70 # The node cannot schedule new pods if its actual Memory load reaches 70%.
+ - plugins:
+ - name: overcommit
+ - name: drf
+ - name: predicates
+ - name: proportion
+ - name: nodeorder
+ - name: binpack
+
+metrics: # metrics server related configuration
+ type: prometheus # Optional. The metrics source type. Supports "prometheus", "prometheus_adapt", and "elasticsearch". prometheus by default.
+ address: http://192.168.0.10:9090 # Mandatory. The metrics source address.
+ interval: 30s # Optional. The scheduler pulls metrics from Prometheus with this interval, 30s by default.
+```
+
+## Integrating with Metrics Sources
+
+### 1. Prometheus Adaptor (Custom Metrics API)
+
+**Recommended approach**. Ensure the Prometheus Adaptor is installed and custom metrics API is available. Add the following rules to Prometheus Adaptor configuration:
+
+```yaml
+rules:
+ - seriesQuery: '{__name__=~"node_cpu_seconds_total"}'
+ resources:
+ overrides:
+ instance:
+ resource: node
+ name:
+ matches: "node_cpu_seconds_total"
+ as: "node_cpu_usage_avg"
+ metricsQuery: avg_over_time((1 - avg (irate(<<.Series>>{mode="idle"}[5m])) by (instance))[10m:30s])
+ - seriesQuery: '{__name__=~"node_memory_MemTotal_bytes"}'
+ resources:
+ overrides:
+ instance:
+ resource: node
+ name:
+ matches: "node_memory_MemTotal_bytes"
+ as: "node_memory_usage_avg"
+ metricsQuery: avg_over_time(((1-node_memory_MemAvailable_bytes/<<.Series>>))[10m:30s])
+```
+
+Set the metrics `type` in the scheduler configmap to `prometheus_adaptor`.
+
+### 2. Prometheus
+
+Set the `metrics.type` to `prometheus` and provide the `metrics.address` directly to the Prometheus server as shown in the configuration example.
+
+### 3. Elasticsearch
+
+Set the `metrics.type` to `elasticsearch` and provide the following configuration:
+```yaml
+metrics:
+ type: elasticsearch
+ address: http://192.168.0.10:9200
+ interval: 30s
+ tls:
+ insecureSkipVerify: "false"
+ elasticsearch:
+ index: "custom-index-name" # Optional. "metricbeat-*" by default
+ username: ""
+ password: ""
+ hostnameFieldName: "host.hostname" # Optional. "host.hostname" by default
+```
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/UserGuide/user_guide.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/UserGuide/user_guide.md
index 20fef946..1c76558a 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/current/UserGuide/user_guide.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/current/UserGuide/user_guide.md
@@ -6,8 +6,8 @@ sidebar_position: 5
本节为 Volcano 用户指南。
* [Ascend vNPU 用户指南](./user_guide_how_to_use_vnpu.md)
-* [Capacity 插件用户指南](./user_guide_how_to_use_capacity_plugin.md)
-* [Cooldown Protection(CDP)插件用户指南](./user_guide_how_to_use_cdp_plugin.md)
+* [Capacity 插件用户指南](../Scheduler/Plugins/capacity.md)
+* [Cooldown Protection(CDP)插件用户指南](../Scheduler/Plugins/cdp.md)
* [Extender 用户指南](./user_guide_how_to_use_extender.md)
* [GPU Number 用户指南](./user_guide_how_to_use_gpu_number.md)
* [GPU Sharing 用户指南](./user_guide_how_to_use_gpu_sharing.md)
@@ -17,17 +17,17 @@ sidebar_position: 5
* [动态资源分配(DRA)用户指南](./user_guide_how_to_enable_dra.md)
* [大规模场景下调优 Volcano 性能](./user_guide_how_to_tune_volcano_performance.md)
* [如何为 Job 配置 PriorityClass](./user_guide_how_to_configure_priorityclass_for_job.md)
-* [MPI 插件用户指南](./user_guide_how_to_use_mpi_plugin.md)
+* [MPI 插件用户指南](../Controller/Plugins/mpi.md)
* [NUMA 感知调度用户指南](./user_guide_how_to_use_numa_aware.md)
* [网络拓扑感知调度用户指南](./user_guide_how_to_use_network_topology_aware_scheduling.md)
-* [NodeGroup 插件用户指南](./user_guide_how_to_use_nodegroup_plugin.md)
-* [PyTorch 插件用户指南](./user_guide_how_to_use_pytorch_plugin.md)
-* [Ray 插件用户指南](./user_guide_how_to_use_ray_plugin.md)
-* [Resource Strategy Fit 插件用户指南](./user_guide_how_to_use_resource_strategy_fit_plugin.md)
-* [Task Topology 插件用户指南](./user_guide_how_to_use_task_topology_plugin.md)
+* [NodeGroup 插件用户指南](../Scheduler/Plugins/nodegroup.md)
+* [PyTorch 插件用户指南](../Controller/Plugins/pytorch.md)
+* [Ray 插件用户指南](../Controller/Plugins/ray.md)
+* [Resource Strategy Fit 插件用户指南](../Scheduler/Plugins/resource_strategy_fit.md)
+* [Task Topology 插件用户指南](../Scheduler/Plugins/task_topology.md)
* [HyperNode 自动发现用户指南](./user_guide_how_to_use_hypernode_auto_discovery.md)
-* [Volcano Job 插件 -- Env 用户指南](./user_guide_how_to_use_env_plugin.md)
-* [Volcano Job 插件 -- SSH 用户指南](./user_guide_how_to_use_ssh_plugin.md)
-* [Volcano Job 插件 -- SVC 用户指南](./user_guide_how_to_use_svc_plugin.md)
+* [Volcano Job 插件 -- Env 用户指南](../Controller/Plugins/env.md)
+* [Volcano Job 插件 -- SSH 用户指南](../Controller/Plugins/ssh.md)
+* [Volcano Job 插件 -- SVC 用户指南](../Controller/Plugins/svc.md)
* [Volcano vGPU 用户指南](./user_guide_how_to_use_volcano_vgpu.md)
* [Scheduling Gates Queue Admission 用户指南](./user_guide_how_to_use_scheduling_gates_queue_admission.md)
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.10.0/Concepts/Queue.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.10.0/Concepts/Queue.md
index 13831c77..58afc08d 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.10.0/Concepts/Queue.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.10.0/Concepts/Queue.md
@@ -52,7 +52,7 @@ deserved表示该queue内所有podgroup的资源应得量,若该queue已分配
**注意**:
-1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](/docs/Scheduler/Plugins/capacity)
> 2. 若queue中已分配的资源量超过了自己配置的deserved值,则queue不可再回收其他队列中的资源
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.10.0/Ecosystem/RayOnVolcano.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.10.0/Ecosystem/RayOnVolcano.md
index daf7fe82..16cb3ecf 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.10.0/Ecosystem/RayOnVolcano.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.10.0/Ecosystem/RayOnVolcano.md
@@ -200,4 +200,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### 了解更多
- 有关KubeRay集成的详细信息,请访问[KubeRay Volcano调度器文档](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/Controller/Plugins/ray)
\ No newline at end of file
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.11.0/Concepts/Queue.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.11.0/Concepts/Queue.md
index 13831c77..58afc08d 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.11.0/Concepts/Queue.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.11.0/Concepts/Queue.md
@@ -52,7 +52,7 @@ deserved表示该queue内所有podgroup的资源应得量,若该queue已分配
**注意**:
-1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](/docs/Scheduler/Plugins/capacity)
> 2. 若queue中已分配的资源量超过了自己配置的deserved值,则queue不可再回收其他队列中的资源
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.11.0/Ecosystem/RayOnVolcano.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.11.0/Ecosystem/RayOnVolcano.md
index daf7fe82..16cb3ecf 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.11.0/Ecosystem/RayOnVolcano.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.11.0/Ecosystem/RayOnVolcano.md
@@ -200,4 +200,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### 了解更多
- 有关KubeRay集成的详细信息,请访问[KubeRay Volcano调度器文档](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/Controller/Plugins/ray)
\ No newline at end of file
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.11.0/KeyFeatures/QueueResourceManagement.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.11.0/KeyFeatures/QueueResourceManagement.md
index b16fb496..eea677e2 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.11.0/KeyFeatures/QueueResourceManagement.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.11.0/KeyFeatures/QueueResourceManagement.md
@@ -64,7 +64,7 @@ Volcano中的队列调度涉及以下核心action:
Volcano提供了两个核心的队列调度插件:
#### capacity插件
-[capacity插件](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)支持通过显式配置deserverd值来设置队列资源应得量,如以下队列配置示例:
+[capacity插件](/docs/Scheduler/Plugins/capacity)支持通过显式配置deserverd值来设置队列资源应得量,如以下队列配置示例:
```yaml
apiVersion: scheduling.volcano.sh/v1beta1
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.12.0/Concepts/Queue.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.12.0/Concepts/Queue.md
index 13831c77..58afc08d 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.12.0/Concepts/Queue.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.12.0/Concepts/Queue.md
@@ -52,7 +52,7 @@ deserved表示该queue内所有podgroup的资源应得量,若该queue已分配
**注意**:
-1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](/docs/Scheduler/Plugins/capacity)
> 2. 若queue中已分配的资源量超过了自己配置的deserved值,则queue不可再回收其他队列中的资源
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.12.0/Ecosystem/RayOnVolcano.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.12.0/Ecosystem/RayOnVolcano.md
index daf7fe82..16cb3ecf 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.12.0/Ecosystem/RayOnVolcano.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.12.0/Ecosystem/RayOnVolcano.md
@@ -200,4 +200,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### 了解更多
- 有关KubeRay集成的详细信息,请访问[KubeRay Volcano调度器文档](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/Controller/Plugins/ray)
\ No newline at end of file
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.12.0/KeyFeatures/QueueResourceManagement.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.12.0/KeyFeatures/QueueResourceManagement.md
index b16fb496..eea677e2 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.12.0/KeyFeatures/QueueResourceManagement.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.12.0/KeyFeatures/QueueResourceManagement.md
@@ -64,7 +64,7 @@ Volcano中的队列调度涉及以下核心action:
Volcano提供了两个核心的队列调度插件:
#### capacity插件
-[capacity插件](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)支持通过显式配置deserverd值来设置队列资源应得量,如以下队列配置示例:
+[capacity插件](/docs/Scheduler/Plugins/capacity)支持通过显式配置deserverd值来设置队列资源应得量,如以下队列配置示例:
```yaml
apiVersion: scheduling.volcano.sh/v1beta1
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.7.0/Concepts/Queue.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.7.0/Concepts/Queue.md
index 13831c77..58afc08d 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.7.0/Concepts/Queue.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.7.0/Concepts/Queue.md
@@ -52,7 +52,7 @@ deserved表示该queue内所有podgroup的资源应得量,若该queue已分配
**注意**:
-1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](/docs/Scheduler/Plugins/capacity)
> 2. 若queue中已分配的资源量超过了自己配置的deserved值,则queue不可再回收其他队列中的资源
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.7.0/Ecosystem/RayOnVolcano.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.7.0/Ecosystem/RayOnVolcano.md
index daf7fe82..16cb3ecf 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.7.0/Ecosystem/RayOnVolcano.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.7.0/Ecosystem/RayOnVolcano.md
@@ -200,4 +200,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### 了解更多
- 有关KubeRay集成的详细信息,请访问[KubeRay Volcano调度器文档](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/Controller/Plugins/ray)
\ No newline at end of file
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.8.2/Concepts/Queue.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.8.2/Concepts/Queue.md
index 13831c77..58afc08d 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.8.2/Concepts/Queue.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.8.2/Concepts/Queue.md
@@ -52,7 +52,7 @@ deserved表示该queue内所有podgroup的资源应得量,若该queue已分配
**注意**:
-1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](/docs/Scheduler/Plugins/capacity)
> 2. 若queue中已分配的资源量超过了自己配置的deserved值,则queue不可再回收其他队列中的资源
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.8.2/Ecosystem/RayOnVolcano.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.8.2/Ecosystem/RayOnVolcano.md
index daf7fe82..16cb3ecf 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.8.2/Ecosystem/RayOnVolcano.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.8.2/Ecosystem/RayOnVolcano.md
@@ -200,4 +200,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### 了解更多
- 有关KubeRay集成的详细信息,请访问[KubeRay Volcano调度器文档](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/Controller/Plugins/ray)
\ No newline at end of file
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.9.0/Concepts/Queue.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.9.0/Concepts/Queue.md
index 13831c77..58afc08d 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.9.0/Concepts/Queue.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.9.0/Concepts/Queue.md
@@ -52,7 +52,7 @@ deserved表示该queue内所有podgroup的资源应得量,若该queue已分配
**注意**:
-1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+1. 该字段只有在capacity插件开启时可按需配置,需要小于等于capability值,proportion插件使用weight来自动计算queue的deserved值。capacity插件使用文档详见:[capacity plugin User Guide](/docs/Scheduler/Plugins/capacity)
> 2. 若queue中已分配的资源量超过了自己配置的deserved值,则queue不可再回收其他队列中的资源
diff --git a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.9.0/Ecosystem/RayOnVolcano.md b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.9.0/Ecosystem/RayOnVolcano.md
index daf7fe82..16cb3ecf 100644
--- a/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.9.0/Ecosystem/RayOnVolcano.md
+++ b/i18n/zh-Hans/docusaurus-plugin-content-docs/version-v1.9.0/Ecosystem/RayOnVolcano.md
@@ -200,4 +200,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### 了解更多
- 有关KubeRay集成的详细信息,请访问[KubeRay Volcano调度器文档](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- 有关Volcano Job Ray插件的详细信息,请参阅[Volcano Ray插件指南](/docs/Controller/Plugins/ray)
\ No newline at end of file
diff --git a/static/img/doc/descheduler-CN.svg b/static/img/doc/descheduler-CN.svg
index f433cfa2..846e1bc1 100644
--- a/static/img/doc/descheduler-CN.svg
+++ b/static/img/doc/descheduler-CN.svg
@@ -1,4 +1 @@
-
-
-
-
\ No newline at end of file
+
\ No newline at end of file
diff --git a/static/img/doc/descheduler_EN.svg b/static/img/doc/descheduler_EN.svg
index bcc4d388..044474e0 100644
--- a/static/img/doc/descheduler_EN.svg
+++ b/static/img/doc/descheduler_EN.svg
@@ -1,4 +1 @@
-
-
-
-
\ No newline at end of file
+
\ No newline at end of file
diff --git a/static/img/doc/volcano_global_design.svg b/static/img/doc/volcano_global_design.svg
index 83bd2baf..45925161 100644
--- a/static/img/doc/volcano_global_design.svg
+++ b/static/img/doc/volcano_global_design.svg
@@ -1,4 +1 @@
-
-
-
-
\ No newline at end of file
+
Create ResourceBinding
Volcano global
Set ResourceBinding.
Suspension.Scheduling=false when admit success
Controller
Find the Queue related to ResourceBinding
Sort ResourceBidning by Queue prioriry
Queue capacity management
CRDs
Vc Job
Queue
Webhook
Set ResourceBinding.
Suspension.Scheduling=true
User
Submit Deployment
Submit Volcano Job
Submit Pod(AI Traing)
Submit PropogationPolicy
Karmada
APIServer
Scheduler
Schedule when ResourceBinding.
Suspension.Scheduling=false
icon-color
Cluster A
Other Clusters
Cluster B
\ No newline at end of file
diff --git a/versioned_docs/version-v1.10.0/Concepts/Queue.md b/versioned_docs/version-v1.10.0/Concepts/Queue.md
index 72a62e7e..7aad21f8 100644
--- a/versioned_docs/version-v1.10.0/Concepts/Queue.md
+++ b/versioned_docs/version-v1.10.0/Concepts/Queue.md
@@ -51,7 +51,7 @@ deserved indicates the expected resource amount for all PodGroups in this queue.
> **Note**:
>
-> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/Scheduler/Plugins/capacity)
> 2. If the allocated resources of a queue exceed its configured deserved value, the queue cannot reclaim resources from other queues
* weight, *optional*
diff --git a/versioned_docs/version-v1.10.0/Ecosystem/RayOnVolcano.md b/versioned_docs/version-v1.10.0/Ecosystem/RayOnVolcano.md
index 034c86c2..be1900a4 100644
--- a/versioned_docs/version-v1.10.0/Ecosystem/RayOnVolcano.md
+++ b/versioned_docs/version-v1.10.0/Ecosystem/RayOnVolcano.md
@@ -203,4 +203,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### Learn More
- For KubeRay integration details, visit the [KubeRay Volcano Scheduler Documentation](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/Controller/Plugins/ray)
\ No newline at end of file
diff --git a/versioned_docs/version-v1.10.0/Scheduler/Plugins.md b/versioned_docs/version-v1.10.0/Scheduler/Plugins.md
index f8e84469..461da705 100644
--- a/versioned_docs/version-v1.10.0/Scheduler/Plugins.md
+++ b/versioned_docs/version-v1.10.0/Scheduler/Plugins.md
@@ -143,3 +143,51 @@ The Numa-Aware Plugin aims to address these limitations.
#### Scenario
Common scenarios for NUMA-Aware are computation-intensive jobs that are sensitive to CPU parameters, scheduling delays. Such as scientific calculation, video decoding, animation rendering, big data offline processing and other specific scenes.
+
+### Usage
+
+#### Overview
+The Usage-based scheduling plugin evaluates actual real-time resource utilization (e.g., CPU, Memory) collected from monitoring systems like Prometheus instead of only depending on requested resources. It prevents new pods from being scheduled onto overloaded nodes and actively balances the cluster workload.
+
+#### Scenario
+Useful in clusters experiencing unbalanced node resource consumption where some nodes are overburdened while others remain idle despite having similar requested resources.
+
+### Rescheduling
+
+#### Overview
+The Rescheduling plugin periodically rebalances the cluster by evaluating real resource utilization. It actively evicts pods from heavily utilized nodes and shuffles them to under-utilized nodes based on configured target thresholds and strategies like LowNodeUtilization or OfflineOnly.
+
+#### Scenario
+Perfect for long-running clusters where dynamic workload lifecycles lead to fragmentation and resource imbalances over time.
+
+### ResourceQuota
+
+#### Overview
+The ResourceQuota plugin interfaces with Kubernetes' native `ResourceQuota` objects to ensure that a PodGroup is only enqueued if there is sufficient resource capacity in its namespace.
+
+#### Scenario
+Highly beneficial in multi-tenant environments to prevent jobs from entering the scheduling pipeline and clogging the queue when they have no chance of running due to namespace quota restrictions.
+
+### Pod Disruption Budget (PDB)
+
+#### Overview
+The PDB Plugin ensures that Volcano respects user-defined Kubernetes PodDisruptionBudget (PDB) constraints during any eviction-based scheduling actions, such as `reclaim`, `preempt`, and `shuffle`.
+
+#### Scenario
+Crucial for highly available workloads where simultaneous eviction of multiple replicas could result in service disruption.
+
+### Overcommit
+
+#### Overview
+The Overcommit Plugin allows the scheduler to artificially inflate the apparent "idle resources" of the cluster by a configurable factor (e.g., 1.2), permitting more jobs to enqueue in the scheduling pipeline than the physical capacity.
+
+#### Scenario
+Useful when administrators want the scheduler to tolerate a larger backlog of `pending` pods waiting for resources without rejecting them outright during peak loads.
+
+### DeviceShare
+
+#### Overview
+The DeviceShare Plugin provides a unified framework for sharing specialized hardware devices such as GPUs, NPUs, and FPGAs across multiple pods.
+
+#### Scenario
+Ideal for advanced AI/ML environments needing granular hardware sharing, like vGPU, vNPU, and GPU exclusive deployments.
diff --git a/versioned_docs/version-v1.11.0/Concepts/Queue.md b/versioned_docs/version-v1.11.0/Concepts/Queue.md
index 72a62e7e..7aad21f8 100644
--- a/versioned_docs/version-v1.11.0/Concepts/Queue.md
+++ b/versioned_docs/version-v1.11.0/Concepts/Queue.md
@@ -51,7 +51,7 @@ deserved indicates the expected resource amount for all PodGroups in this queue.
> **Note**:
>
-> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/Scheduler/Plugins/capacity)
> 2. If the allocated resources of a queue exceed its configured deserved value, the queue cannot reclaim resources from other queues
* weight, *optional*
diff --git a/versioned_docs/version-v1.11.0/Ecosystem/RayOnVolcano.md b/versioned_docs/version-v1.11.0/Ecosystem/RayOnVolcano.md
index 034c86c2..be1900a4 100644
--- a/versioned_docs/version-v1.11.0/Ecosystem/RayOnVolcano.md
+++ b/versioned_docs/version-v1.11.0/Ecosystem/RayOnVolcano.md
@@ -203,4 +203,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### Learn More
- For KubeRay integration details, visit the [KubeRay Volcano Scheduler Documentation](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/Controller/Plugins/ray)
\ No newline at end of file
diff --git a/versioned_docs/version-v1.11.0/KeyFeatures/QueueResourceManagement.md b/versioned_docs/version-v1.11.0/KeyFeatures/QueueResourceManagement.md
index a0918d2b..ba21ba1e 100644
--- a/versioned_docs/version-v1.11.0/KeyFeatures/QueueResourceManagement.md
+++ b/versioned_docs/version-v1.11.0/KeyFeatures/QueueResourceManagement.md
@@ -50,7 +50,7 @@ deserved indicates the expected resource amount for all PodGroups in this queue.
> **Note**:
>
-> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/Scheduler/Plugins/capacity)
> 2. If the allocated resources of a queue exceed its configured deserved value, the queue cannot reclaim resources from other queues
* weight, *optional*
diff --git a/versioned_docs/version-v1.11.0/Scheduler/Plugins.md b/versioned_docs/version-v1.11.0/Scheduler/Plugins.md
index f8e84469..71bed0f2 100644
--- a/versioned_docs/version-v1.11.0/Scheduler/Plugins.md
+++ b/versioned_docs/version-v1.11.0/Scheduler/Plugins.md
@@ -143,3 +143,50 @@ The Numa-Aware Plugin aims to address these limitations.
#### Scenario
Common scenarios for NUMA-Aware are computation-intensive jobs that are sensitive to CPU parameters, scheduling delays. Such as scientific calculation, video decoding, animation rendering, big data offline processing and other specific scenes.
+### Usage
+
+#### Overview
+The Usage-based scheduling plugin evaluates actual real-time resource utilization (e.g., CPU, Memory) collected from monitoring systems like Prometheus instead of only depending on requested resources. It prevents new pods from being scheduled onto overloaded nodes and actively balances the cluster workload.
+
+#### Scenario
+Useful in clusters experiencing unbalanced node resource consumption where some nodes are overburdened while others remain idle despite having similar requested resources.
+
+### Rescheduling
+
+#### Overview
+The Rescheduling plugin periodically rebalances the cluster by evaluating real resource utilization. It actively evicts pods from heavily utilized nodes and shuffles them to under-utilized nodes based on configured target thresholds and strategies like LowNodeUtilization or OfflineOnly.
+
+#### Scenario
+Perfect for long-running clusters where dynamic workload lifecycles lead to fragmentation and resource imbalances over time.
+
+### ResourceQuota
+
+#### Overview
+The ResourceQuota plugin interfaces with Kubernetes' native `ResourceQuota` objects to ensure that a PodGroup is only enqueued if there is sufficient resource capacity in its namespace.
+
+#### Scenario
+Highly beneficial in multi-tenant environments to prevent jobs from entering the scheduling pipeline and clogging the queue when they have no chance of running due to namespace quota restrictions.
+
+### Pod Disruption Budget (PDB)
+
+#### Overview
+The PDB Plugin ensures that Volcano respects user-defined Kubernetes PodDisruptionBudget (PDB) constraints during any eviction-based scheduling actions, such as `reclaim`, `preempt`, and `shuffle`.
+
+#### Scenario
+Crucial for highly available workloads where simultaneous eviction of multiple replicas could result in service disruption.
+
+### Overcommit
+
+#### Overview
+The Overcommit Plugin allows the scheduler to artificially inflate the apparent "idle resources" of the cluster by a configurable factor (e.g., 1.2), permitting more jobs to enqueue in the scheduling pipeline than the physical capacity.
+
+#### Scenario
+Useful when administrators want the scheduler to tolerate a larger backlog of `pending` pods waiting for resources without rejecting them outright during peak loads.
+
+### DeviceShare
+
+#### Overview
+The DeviceShare Plugin provides a unified framework for sharing specialized hardware devices such as GPUs, NPUs, and FPGAs across multiple pods.
+
+#### Scenario
+Ideal for advanced AI/ML environments needing granular hardware sharing, like vGPU, vNPU, and GPU exclusive deployments.
diff --git a/versioned_docs/version-v1.12.0/Concepts/Queue.md b/versioned_docs/version-v1.12.0/Concepts/Queue.md
index 72a62e7e..7aad21f8 100644
--- a/versioned_docs/version-v1.12.0/Concepts/Queue.md
+++ b/versioned_docs/version-v1.12.0/Concepts/Queue.md
@@ -51,7 +51,7 @@ deserved indicates the expected resource amount for all PodGroups in this queue.
> **Note**:
>
-> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/Scheduler/Plugins/capacity)
> 2. If the allocated resources of a queue exceed its configured deserved value, the queue cannot reclaim resources from other queues
* weight, *optional*
diff --git a/versioned_docs/version-v1.12.0/Ecosystem/RayOnVolcano.md b/versioned_docs/version-v1.12.0/Ecosystem/RayOnVolcano.md
index 034c86c2..be1900a4 100644
--- a/versioned_docs/version-v1.12.0/Ecosystem/RayOnVolcano.md
+++ b/versioned_docs/version-v1.12.0/Ecosystem/RayOnVolcano.md
@@ -203,4 +203,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### Learn More
- For KubeRay integration details, visit the [KubeRay Volcano Scheduler Documentation](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/Controller/Plugins/ray)
\ No newline at end of file
diff --git a/versioned_docs/version-v1.12.0/KeyFeatures/QueueResourceManagement.md b/versioned_docs/version-v1.12.0/KeyFeatures/QueueResourceManagement.md
index a0918d2b..ba21ba1e 100644
--- a/versioned_docs/version-v1.12.0/KeyFeatures/QueueResourceManagement.md
+++ b/versioned_docs/version-v1.12.0/KeyFeatures/QueueResourceManagement.md
@@ -50,7 +50,7 @@ deserved indicates the expected resource amount for all PodGroups in this queue.
> **Note**:
>
-> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/Scheduler/Plugins/capacity)
> 2. If the allocated resources of a queue exceed its configured deserved value, the queue cannot reclaim resources from other queues
* weight, *optional*
diff --git a/versioned_docs/version-v1.7.0/Concepts/Queue.md b/versioned_docs/version-v1.7.0/Concepts/Queue.md
index 72a62e7e..7aad21f8 100644
--- a/versioned_docs/version-v1.7.0/Concepts/Queue.md
+++ b/versioned_docs/version-v1.7.0/Concepts/Queue.md
@@ -51,7 +51,7 @@ deserved indicates the expected resource amount for all PodGroups in this queue.
> **Note**:
>
-> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/Scheduler/Plugins/capacity)
> 2. If the allocated resources of a queue exceed its configured deserved value, the queue cannot reclaim resources from other queues
* weight, *optional*
diff --git a/versioned_docs/version-v1.7.0/Ecosystem/RayOnVolcano.md b/versioned_docs/version-v1.7.0/Ecosystem/RayOnVolcano.md
index 034c86c2..be1900a4 100644
--- a/versioned_docs/version-v1.7.0/Ecosystem/RayOnVolcano.md
+++ b/versioned_docs/version-v1.7.0/Ecosystem/RayOnVolcano.md
@@ -203,4 +203,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### Learn More
- For KubeRay integration details, visit the [KubeRay Volcano Scheduler Documentation](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/Controller/Plugins/ray)
\ No newline at end of file
diff --git a/versioned_docs/version-v1.8.2/Concepts/Queue.md b/versioned_docs/version-v1.8.2/Concepts/Queue.md
index 72a62e7e..7aad21f8 100644
--- a/versioned_docs/version-v1.8.2/Concepts/Queue.md
+++ b/versioned_docs/version-v1.8.2/Concepts/Queue.md
@@ -51,7 +51,7 @@ deserved indicates the expected resource amount for all PodGroups in this queue.
> **Note**:
>
-> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/Scheduler/Plugins/capacity)
> 2. If the allocated resources of a queue exceed its configured deserved value, the queue cannot reclaim resources from other queues
* weight, *optional*
diff --git a/versioned_docs/version-v1.8.2/Ecosystem/RayOnVolcano.md b/versioned_docs/version-v1.8.2/Ecosystem/RayOnVolcano.md
index 034c86c2..be1900a4 100644
--- a/versioned_docs/version-v1.8.2/Ecosystem/RayOnVolcano.md
+++ b/versioned_docs/version-v1.8.2/Ecosystem/RayOnVolcano.md
@@ -203,4 +203,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### Learn More
- For KubeRay integration details, visit the [KubeRay Volcano Scheduler Documentation](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/Controller/Plugins/ray)
\ No newline at end of file
diff --git a/versioned_docs/version-v1.9.0/Concepts/Queue.md b/versioned_docs/version-v1.9.0/Concepts/Queue.md
index 72a62e7e..7aad21f8 100644
--- a/versioned_docs/version-v1.9.0/Concepts/Queue.md
+++ b/versioned_docs/version-v1.9.0/Concepts/Queue.md
@@ -51,7 +51,7 @@ deserved indicates the expected resource amount for all PodGroups in this queue.
> **Note**:
>
-> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/UserGuide/user_guide_how_to_use_capacity_plugin)
+> 1. This field can only be configured when the capacity plugin is enabled, and must be less than or equal to the capability value. The proportion plugin uses weight to automatically calculate the queue's deserved value. For more information on using the capacity plugin, see: [capacity plugin user guide](/docs/Scheduler/Plugins/capacity)
> 2. If the allocated resources of a queue exceed its configured deserved value, the queue cannot reclaim resources from other queues
* weight, *optional*
diff --git a/versioned_docs/version-v1.9.0/Ecosystem/RayOnVolcano.md b/versioned_docs/version-v1.9.0/Ecosystem/RayOnVolcano.md
index 034c86c2..be1900a4 100644
--- a/versioned_docs/version-v1.9.0/Ecosystem/RayOnVolcano.md
+++ b/versioned_docs/version-v1.9.0/Ecosystem/RayOnVolcano.md
@@ -203,4 +203,4 @@ ray job submit --address http://localhost:8265 -- python -c "import ray; ray.ini
### Learn More
- For KubeRay integration details, visit the [KubeRay Volcano Scheduler Documentation](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano)
-- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/UserGuide/user_guide_how_to_use_ray_plugin)
\ No newline at end of file
+- For Volcano Job Ray plugin details, see the [Volcano Ray Plugin Guide](/docs/Controller/Plugins/ray)
\ No newline at end of file