[Intel NPU] Add Windows & Linux Intel NPU support#1171
[Intel NPU] Add Windows & Linux Intel NPU support#1171Looong01 wants to merge 29 commits intolightvector:masterfrom
Conversation
|
This is screenshot of Sabaki testing: And the binary release here: https://github.com/Looong01/KataGo-Multi-backends/releases/tag/v1.16.4-openvino |
|
I partially referenced the code from #1164, and I am very grateful to @ChinChangYang |
|
This is a wonderful work! I will test this backend in a few days. |
|
Actually, this PR includes AMD GPU ROCm backends. Also all the tests and benchmark passed. I also try to use Openvino to implement Intel NPU backends directly, but that performance worse than Onnxruntime+Openvino solution. And it is not stable. So I just push the Codes of Onnxruntime+OpenVINO |
|
I will implement AMD NPU backend in days. |
|
Thank you! I am concerned about the poor performance of multi-threading. As shown in the figure, when the number of threads increases, the computation speed actually decreases. Is this because the NPU itself is not suitable for multi-threading, or is it still possible to optimize multi-threading at this stage? |
|
This is amazing on my Linux notebook. I am seeing a 3.5x speedup (87.30 vs 25.16 visits/s) compared to OpenCL, which seems unusually slow on my system. I really appreciate this. As for To build ONNX Runtime, I had to downgrade gcc-15 to gcc-14. Also, the source directories seem different from the document, so I used the following commands in zsh. |
|
Claude detects an issues in a Docker container. Bug: Error message: Root cause:
Reproduction steps: # 1. Clone and checkout this PR branch
git clone https://github.com/lightvector/KataGo.git
cd KataGo
git fetch origin pull/1171/head:pr-1171
git checkout pr-1171
# 2. Download ORT prebuilt + build onnx_proto and protobuf-lite from source
# (ONNXRUNTIME_ROOT = prebuilt ORT package dir)
# (ONNX_INCLUDE_DIR = ort-build/_deps/onnx-build)
# (ONNX_PROTO_LIB = ort-build/_deps/onnx-build/libonnx_proto.a)
# (PROTOBUF_INCLUDE_DIR = ort-build/_deps/protobuf-src/src)
# (PROTOBUF_LIB = ort-build/_deps/protobuf-build/libprotobuf-lite.a)
# 3. Configure
mkdir build && cd build
cmake ../cpp \
-DUSE_BACKEND=ONNX \
-DKATAGO_AUTO_FETCH_DEPS=OFF \
-DONNXRUNTIME_ROOT=<ort-prebuilt-dir> \
-DONNX_INCLUDE_DIR=<ort-build>/_deps/onnx-build \
-DONNX_PROTO_LIB=<ort-build>/_deps/onnx-build/libonnx_proto.a \
-DPROTOBUF_INCLUDE_DIR=<ort-build>/_deps/protobuf-src/src \
-DPROTOBUF_LIB=<ort-build>/_deps/protobuf-build/libprotobuf-lite.a \
-DCMAKE_CXX_FLAGS="-DONNX_ML"
# 4. Build → fails at onnxmodelbuilder.cpp
cmake --build . -j$(nproc)System environment:
Fix: In -#include <onnx/onnx-ml.pb.h>
+#include <onnx/onnx_pb.h>
|
Actually, my CMakeLists.txt deal with it well. I use vcpkg to deal with this deps. Or, do u still think I need to do this change? |
Bcs NPU is different arch(totally different from GPU or CPU), single threading is enough for it. |
I think you misunderstood my comment. The reproduction steps fetch #1171, exact this PR, not mine. |
But I don't meet any error when I compile it. Maybe only happen with GCC-15? |
11433e6 resolves the issue. Thanks. |





Summary
This PR adds and hardens the Windows & Linux Intel NPU path for KataGo using the ONNX backend with ONNX Runtime + OpenVINO Execution Provider, and updates docs/config guidance for an end-to-end workflow.
It also improves failure behavior for non-ONNX builds and simplifies Windows & Linux dependency handling.
What Changed
1) ONNX backend and OpenVINO provider support
onnxProvider(cpu,openvino,cuda,tensorrt,migraphx,coreml).onnxOpenVINODeviceTypeonnxOpenVINODeviceIdonnxOpenVINOCacheDironnxOpenVINOEnableNPUFastCompile(best-effort; depends on ORT build support).onnxmodels directly.bin/.bin.gzmodels via internal conversion to ONNX graph2)
exportonnxcommand behaviorexportonnxis available in ONNX builds and exports fixed-size ONNX models.-x/-ycan override).exportonnxnow returns a clear error instead of failing ambiguously.3) Config safety for non-ONNX binaries
onnx*config keys now fails fast with a clear message.4) CMake dependency flow
ONNXRUNTIME_ROOT(defaulting tocpp/external/onnxruntime-win-x64-openvinoandcpp/external/onnxruntime-linux-x64-openvino).zlib,onnx,protobuf) through vcpkg when enabled.5) Documentation updates
Compiling.md:use_openvino=NPU)cpp/external/onnxruntime-win-x64-openvino.README.md:exportonnx(default 19x19)benchmarkgtpBehavior Notes
onnxDeviceToUseThread*) is mainly intended for ONNX providers like CUDA/TensorRT/MIGraphX.Validation
exportonnxworks from.bin/.bin.gz->.onnx.benchmark/gtprun withonnxProvider=openvinoandonnxOpenVINODeviceType=NPU.