Add unified PTO runtime package (L2/L3) by hw-native-sys-bot · Pull Request #355 · hw-native-sys/simpler

hw-native-sys-bot · 2026-03-24T12:14:48Z

Summary

Adds python/pto/ package with unified Runtime API that routes by level ("chip" → L2, "host" → L3)
L2Runtime wraps existing bindings.py ctypes interface behind register()/run()/close()
L3Runtime manages per-chip worker processes (one process per device due to DeviceRunner singleton), a handle-based task DAG with eager dispatch, and Python orchestration functions
Includes pto.compile() with SHA256-based caching for compiled artifacts

File layout

python/pto/
  __init__.py       — Package exports: Runtime, Arg, compile
  types.py          — Arg, TensorHandle, CompiledPackage, ParamType
  compiler.py       — pto.compile() + caching
  runtime.py        — Unified Runtime entry (routes by level)
  l2_runtime.py     — Single-chip runtime (wraps bindings.py)
  l3_runtime.py     — Multi-chip runtime (workers + DAG + orch)
  l3_context.py     — L3OrchestratorContext (ctx for Python orch)
  l3_worker.py      — ChipWorker subprocess management
  dag.py            — TaskDAG with handle-based dependency inference

Usage

import pto

# L2 — single chip
rt = pto.Runtime(level="chip", platform="a2a3", device=0)
rt.register("vector_add", orch="orch.cpp", kernels=[...])
rt.run("vector_add", args=[pto.Arg.input(x), pto.Arg.output(y)])
rt.close()

# L3 — multi chip
rt = pto.Runtime(level="host", platform="a2a3", devices=[0, 1, 2, 3])
pkg = pto.compile(platform="a2a3", orch="orch.cpp", kernels=[...])
rt.register("pipeline", orch=my_orch_func, kernels={"compute": pkg})
rt.run("pipeline", args={"input": data})
rt.close()

Test plan

Verify L2 path: pto.Runtime(level="chip") runs existing single-chip examples
Verify L3 path: multi-chip DAG dispatch with sim platform
Unit tests for TaskDAG dependency inference
Unit tests for L3OrchestratorContext

🤖 Generated with Claude Code

Integrate AscendC operators into PTO runtime via single-TU compile + link: - Add AscendCToolchain to toolchain.py with --cce-aicore-lang dialect, SDK include paths from bisheng_intf.cmake, and auto-sync flags - Add ascendc_compiler.py with single-TU approach: merge kernel_entry wrapper + user source into one translation unit, compile with AscendC flags, link with ld.lld to resolve block-local relocations - Dispatch compiler='ascendc' kernels in code_runner.py - Add ascendc_vector_example device test (z=x+y via AscendC, w=z*z via PTO) under tests/device_tests/ to avoid sim CI discovery - Add unit tests for wrapper generation, merged source generation, artifact extraction, and toolchain flags

- Remove source compilation path from ascendc_compiler.py (no more _compile_kernel_source, use_ascendc_compiler, ascendc_kernel_source) - compile_ascendc_kernel() now requires pre-compiled .o bytes directly - Update code_runner.py to reject non-.o sources for ascendc kernels - kernel_config.py now references add_custom.o (externally compiled) - add_custom.cpp is reference-only documentation (wrapped in #if 0) - Update tests to match simplified API

- Add pre-compiled add_custom.o (AscendC AddCustom, float32, dav-c220-vec) with two critical adaptations for PTO dispatch: 1. No __global__ attribute (causes hang under PTO subroutine dispatch) 2. No GetBlockNum()/GetBlockIdx() (PTO dispatches to single cores) - Add build_add_custom.sh to reproduce the .o from source - Fix __gm__ address space qualifier in wrapper forward declaration: workspace and tiling params use plain uint8_t* (not __gm__) since PTO's ccec treats __gm__ as a real address-space qualifier - Update reference source and kernel_config docs - Device test passes: z (add) and w (mul) match golden on hardware

Implements the python/pto/ package with a unified Runtime API that routes by level ("chip" for L2, "host" for L3). L3 manages per-chip worker processes, a handle-based task DAG with eager dispatch, and Python orchestration functions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-03-24T12:18:09Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a unified Python Tensor Orchestration (PTO) runtime package, enabling both single-chip (L2) and multi-chip (L3) execution on Ascend NPU devices. It provides a high-level API for registering and running computations, manages task dependencies, and incorporates caching mechanisms for efficient compilation. The changes also include support for integrating pre-compiled AscendC kernels, expanding the range of supported operations.

Highlights

Unified PTO Runtime: Introduces a unified python/pto/ package providing a consistent Runtime API for both L2 (single-chip) and L3 (multi-chip) execution on Ascend NPU devices.
L2 and L3 Runtime Implementations: Includes L2Runtime (wraps existing bindings.py) and L3Runtime (manages per-chip worker processes and a handle-based task DAG).
AscendC Kernel Integration: Adds support for integrating pre-compiled AscendC kernels into the PTO runtime, including a new AscendCCompiler class and workflow.
Compilation and Caching: Implements pto.compile() with SHA256-based caching for compiled artifacts, improving build times.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist · 2026-03-24T12:22:24Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

ChaoWao and others added 4 commits March 20, 2026 21:35

ChaoWao marked this pull request as draft March 25, 2026 01:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unified PTO runtime package (L2/L3)#355

Add unified PTO runtime package (L2/L3)#355
hw-native-sys-bot wants to merge 4 commits intohw-native-sys:mainfrom
hw-native-sys-bot:feat/unified-runtime-l3

hw-native-sys-bot commented Mar 24, 2026

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hw-native-sys-bot commented Mar 24, 2026

Summary

File layout

Usage

Test plan

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants