Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
4623fe7
Fix CoreML backend batched evaluation bug
ChinChangYang Dec 31, 2025
6a67e80
Add Core ML backend support to CMake configuration
ChinChangYang Dec 31, 2025
1b1d246
Add CoreML backend entry to .gitignore
ChinChangYang Dec 31, 2025
7e4ef7b
Fix CoreML backend MLMultiArray stride handling bug
ChinChangYang Dec 31, 2025
2c77de1
Add KataGoCoreML-swift.h to .gitignore
ChinChangYang Dec 31, 2025
927af3b
Fix CoreML backend pass policy output name mismatch
ChinChangYang Dec 31, 2025
402e8d9
Add configurable FP16/FP32 precision to CoreML backend
ChinChangYang Jan 1, 2026
65c27b1
Replace Python CoreML converter with native katagocoreml library
ChinChangYang Jan 5, 2026
e6fd901
Add hybrid CoreML + MPSGraph backend for improved throughput
ChinChangYang Jan 5, 2026
8f24d1a
Optimize MPSGraph mask operations when requireExactNNLen is true
ChinChangYang Jan 5, 2026
270651b
Remove unused maskSize variable in HybridComputeHandle
ChinChangYang Jan 5, 2026
31a36a3
Add CoreML backend build instructions to Compiling.md
ChinChangYang Jan 5, 2026
f952519
Add CoreML backend CI job to GitHub Actions workflow
ChinChangYang Jan 6, 2026
bb69d0d
Simplify CoreML model loading with dynamic batch size support
ChinChangYang Jan 20, 2026
e914062
Add FP32 GPU-only mode using MPSGraph to bypass CoreML converter
ChinChangYang Jan 20, 2026
20ab597
Use MPSGraph-only mode when batch size is too small for hybrid split
ChinChangYang Jan 21, 2026
dad0dac
Improve ThroughputTracker for selfplay workloads
ChinChangYang Jan 21, 2026
46eb2c5
Unify CoreML and Metal into single Metal backend
ChinChangYang Jan 24, 2026
fc050b6
Replace hybrid batch splitting with per-thread GPU/ANE multiplexer
ChinChangYang Feb 23, 2026
ce6eb9b
Add gpuIdx validation, FP32+ANE warning, and fix startup message
ChinChangYang Feb 23, 2026
1f8daa8
Fix outdated comments and add metalUseFP16 config documentation
ChinChangYang Feb 23, 2026
8cfeae5
Include gpuIdx in ComputeHandle error message for easier diagnosis
ChinChangYang Feb 23, 2026
76283d9
Replace bool arithmetic with explicit XOR logic in ComputeHandle check
ChinChangYang Feb 23, 2026
18c4642
Refactor MPSGraph inference to use explicit command buffer
ChinChangYang Feb 26, 2026
60760f1
Remove dead USE_COREML_BACKEND ifdef and fix gitignore comment
ChinChangYang Feb 27, 2026
308ad19
Adjusted the parameter alignment in the getMetalOutput function
ChinChangYang Feb 27, 2026
ea9c81f
Remove COREML backend condition
ChinChangYang Feb 27, 2026
e59bdf8
Hard-fail on CoreML inference error in apply()
ChinChangYang Mar 15, 2026
8a352ed
Fail loudly on unhandled activation kind in Metal backend
ChinChangYang Mar 15, 2026
b8bd04b
Remove obsolete assertions in MatMulLayer
ChinChangYang Mar 21, 2026
6c3df7a
Add assertion for dilation parameters in ConvLayer
ChinChangYang Mar 21, 2026
2b8aa73
Vendor katagocoreml-cpp library to make Metal backend self-contained
ChinChangYang Mar 22, 2026
32e7209
Deduplicate nlohmann/json by reusing existing copy in cpp/external/
ChinChangYang Mar 22, 2026
b0adf3e
Deduplicate abseil module list and document GLOB caveat in katagocore…
ChinChangYang Mar 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,54 @@ jobs:
name: katago-macos-opencl
path: cpp/katago

build-macos-metal:
runs-on: macos-latest
permissions:
contents: read

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Install dependencies
run: |
brew install ninja zlib libzip protobuf abseil

- name: Cache CMake build
uses: actions/cache@v4
with:
path: |
cpp/CMakeCache.txt
cpp/CMakeFiles
cpp/build.ninja
cpp/.ninja_deps
cpp/.ninja_log
key: ${{ runner.os }}-cmake-metal-${{ hashFiles('**/CMakeLists.txt') }}
restore-keys: |
${{ runner.os }}-cmake-metal-

- name: Configure CMake
working-directory: cpp
run: |
cmake . -G Ninja -DUSE_BACKEND=METAL -DCMAKE_BUILD_TYPE=Release

- name: Build
working-directory: cpp
run: |
ninja

- name: Run tests
working-directory: cpp
run: |
./katago runtests

- name: Upload artifact
if: github.event_name == 'push' && github.ref == 'refs/heads/master'
uses: actions/upload-artifact@v4
with:
name: katago-macos-metal
path: cpp/katago

build-windows:
runs-on: windows-latest
permissions:
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,4 @@ cpp/.ninja_log
cpp/build.ninja
cpp/KataGoSwift.*
cpp/include/KataGoSwift/KataGoSwift-swift.h
cpp/external/katagocoreml/proto/
3 changes: 2 additions & 1 deletion Compiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ As also mentioned in the instructions below but repeated here for visibility, if
* If using OpenCL, you will want to verify that KataGo is picking up the correct device (e.g. some systems may have both an Intel CPU OpenCL and GPU OpenCL, if KataGo appears to pick the wrong one, you can correct this by specifying `openclGpuToUse` in `configs/gtp_example.cfg`).

## MacOS
* TLDR:
* TLDR (Metal backend - recommended for most users, hybrid CPU+GPU+Neural Engine for maximum throughput):
```
git clone https://github.com/lightvector/KataGo.git
cd KataGo/cpp
Expand All @@ -132,6 +132,7 @@ As also mentioned in the instructions below but repeated here for visibility, if
* CMake with a minimum version of 3.18.2: `brew install cmake`.
* AppleClang and Swift compilers: `xcode-select --install`.
* If using the Metal backend, [Ninja](https://ninja-build.org): `brew install ninja`
* If using the Metal backend, protobuf and abseil: `brew install protobuf abseil`
* libzip: `brew install libzip`.
* If you want to do self-play training and research, probably Google perftools `brew install gperftools` for TCMalloc or some other better malloc implementation. For unknown reasons, the allocation pattern in self-play with large numbers of threads and parallel games causes a lot of memory fragmentation under glibc malloc that will eventually run your machine out of memory, but better mallocs handle it fine.
* If compiling to contribute to public distributed training runs, OpenSSL is required (`brew install openssl`).
Expand Down
18 changes: 11 additions & 7 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ endif()
set(BUILD_DISTRIBUTED 0 CACHE BOOL "Build with http support for contributing to distributed training")
set(USE_BACKEND CACHE STRING "Neural net backend")
string(TOUPPER "${USE_BACKEND}" USE_BACKEND)
set_property(CACHE USE_BACKEND PROPERTY STRINGS "" CUDA TENSORRT OPENCL EIGEN)
set_property(CACHE USE_BACKEND PROPERTY STRINGS "" CUDA TENSORRT OPENCL EIGEN METAL)

set(USE_TCMALLOC 0 CACHE BOOL "Use TCMalloc")
set(NO_GIT_REVISION 0 CACHE BOOL "Disable embedding the git revision into the compiled exe")
Expand Down Expand Up @@ -97,7 +97,7 @@ elseif(USE_BACKEND STREQUAL "TENSORRT")
message(FATAL_ERROR "Combining USE_CACHE_TENSORRT_PLAN with BUILD_DISTRIBUTED is not supported - it would consume excessive disk space and might worsen performance every time models are updated. Use only one at a time in a given build of KataGo.")
endif()
elseif(USE_BACKEND STREQUAL "METAL")
message(STATUS "-DUSE_BACKEND=METAL, using Metal backend.")
message(STATUS "-DUSE_BACKEND=METAL, using Metal backend with hybrid MPSGraph + CoreML execution.")
if(NOT "${CMAKE_GENERATOR}" STREQUAL "Ninja")
message(FATAL_ERROR "Bidirectional C++ Interop requires Ninja generator. Have ${CMAKE_GENERATOR}")
endif()
Expand All @@ -107,6 +107,7 @@ elseif(USE_BACKEND STREQUAL "METAL")
if(NOT "${CMAKE_CXX_COMPILER_ID}" STREQUAL "AppleClang")
message(FATAL_ERROR "Project requires building with AppleClang. Have ${CMAKE_CXX_COMPILER_ID}")
endif()
add_subdirectory(external/katagocoreml)
list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/external/macos/cmake/modules")
include(InitializeSwift)
include(AddSwift)
Expand All @@ -115,11 +116,11 @@ elseif(USE_BACKEND STREQUAL "METAL")
neuralnet/metalbackend.cpp
)
add_library(KataGoSwift STATIC
neuralnet/metalbackend.swift)
neuralnet/metalbackend.swift
neuralnet/metallayers.swift)
_swift_generate_cxx_header(
KataGoSwift
"${CMAKE_CURRENT_BINARY_DIR}/include/KataGoSwift/KataGoSwift-swift.h"
SOURCES "${CMAKE_CURRENT_SOURCE_DIR}/neuralnet/metalbackend.swift")
"${CMAKE_CURRENT_BINARY_DIR}/include/KataGoSwift/KataGoSwift-swift.h")
target_include_directories(KataGoSwift PUBLIC "${CMAKE_CURRENT_BINARY_DIR}/include")
set_target_properties(KataGoSwift PROPERTIES Swift_MODULE_NAME "KataGoSwift")
target_compile_options(KataGoSwift PUBLIC
Expand Down Expand Up @@ -399,9 +400,12 @@ elseif(USE_BACKEND STREQUAL "TENSORRT")
target_link_libraries(katago CUDA::cudart_static ${TENSORRT_LIBRARY})
elseif(USE_BACKEND STREQUAL "METAL")
target_compile_definitions(katago PRIVATE USE_METAL_BACKEND)
target_link_libraries(katago KataGoSwift)
target_link_libraries(katago KataGoSwift katagocoreml
${KATAGOCOREML_DEP_LDFLAGS}
"-framework MetalPerformanceShaders"
"-framework MetalPerformanceShadersGraph")
if("${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "x86_64")
message(WARNING "You are currently running cmake on an Intel-based processor. It is known that running KataGo in this configuration may encounter performance issues. It is recommended to switch to a cmake version designed for ARM64 architecture for optimal performance.")
message(WARNING "Metal backend may not work optimally on Intel. ARM64 architecture is recommended.")
endif()
elseif(USE_BACKEND STREQUAL "OPENCL")
target_compile_definitions(katago PRIVATE USE_OPENCL_BACKEND)
Expand Down
32 changes: 25 additions & 7 deletions cpp/configs/analysis_example.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -224,15 +224,33 @@ nnRandomize = true
# ------------------------------
# These only apply when using the METAL version of KataGo.

# For one Metal instance: KataGo will automatically use the default device.
# metalDeviceToUse = 0

# For two Metal instance: Uncomment these options, AND set numNNServerThreadsPerModel = 2 above.
# This will create two Metal instances, best overlapping the GPU and CPU execution.
# Metal backend dispatch is configured via numNNServerThreadsPerModel and metalDeviceToUseThread<N>.
# Device index values:
# 0 = GPU only (MPSGraph) - default
# 100 = ANE only (CoreML, runs on CPU + Apple Neural Engine)
#
# Mux mode (recommended): 4 pipelined server threads (2x GPU + 2x ANE).
# Set nnMaxBatchSize to half of numSearchThreads for optimal pipelining.
#
# Example: mux mode (best throughput)
# numNNServerThreadsPerModel = 4
# metalDeviceToUseThread0 = 0
# metalDeviceToUseThread1 = 0
# metalDeviceToUseThread2 = 100
# metalDeviceToUseThread3 = 100
#
# Example: GPU-only mode (default)
# numNNServerThreadsPerModel = 1
# metalDeviceToUseThread0 = 0
# metalDeviceToUseThread1 = 1
#
# Example: ANE-only mode
# numNNServerThreadsPerModel = 1
# metalDeviceToUseThread0 = 100
#
# Default (no config): 1 server thread, GPU-only mode (gpuIdx = 0).

# The pattern continues for additional Metal instances.
# FP16 precision (default true). Set to false for exact FP32 inference (slower).
# metalUseFP16 = true


# OpenCL-specific GPU settings--------------------------------------
Expand Down
32 changes: 25 additions & 7 deletions cpp/configs/gtp_example.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -460,15 +460,33 @@ searchFactorWhenWinningThreshold = 0.95
# ------------------------------
# These only apply when using the METAL version of KataGo.

# For one Metal instance: KataGo will automatically use the default device.
# metalDeviceToUse = 0

# For two Metal instance: Uncomment these options, AND set numNNServerThreadsPerModel = 2 above.
# This will create two Metal instances, best overlapping the GPU and CPU execution.
# Metal backend dispatch is configured via numNNServerThreadsPerModel and metalDeviceToUseThread<N>.
# Device index values:
# 0 = GPU only (MPSGraph) - default
# 100 = ANE only (CoreML, runs on CPU + Apple Neural Engine)
#
# Mux mode (recommended): 4 pipelined server threads (2x GPU + 2x ANE).
# Set nnMaxBatchSize to half of numSearchThreads for optimal pipelining.
#
# Example: mux mode (best throughput)
# numNNServerThreadsPerModel = 4
# metalDeviceToUseThread0 = 0
# metalDeviceToUseThread1 = 1
# metalDeviceToUseThread1 = 0
# metalDeviceToUseThread2 = 100
# metalDeviceToUseThread3 = 100
#
# Example: GPU-only mode (default)
# numNNServerThreadsPerModel = 1
# metalDeviceToUseThread0 = 0
#
# Example: ANE-only mode
# numNNServerThreadsPerModel = 1
# metalDeviceToUseThread0 = 100
#
# Default (no config): 1 server thread, GPU-only mode (gpuIdx = 0).

# The pattern continues for additional Metal instances.
# FP16 precision (default true). Set to false for exact FP32 inference (slower).
# metalUseFP16 = true

# ------------------------------
# OpenCL GPU settings
Expand Down
132 changes: 132 additions & 0 deletions cpp/external/katagocoreml/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# katagocoreml - KataGo to Core ML Converter (vendored)
# Simplified build for use as a subdirectory of KataGo.
#
# Note: We deliberately avoid linking against CMake's abseil/protobuf targets
# (find_package(absl)) because their INTERFACE_LINK_LIBRARIES propagate
# "-Wl,-framework,CoreFoundation" to the final executable, which swiftc
# (used as the linker for the katago Swift/C++ hybrid) does not understand.
# Instead, we use pkg-config for include/link flags, which produces
# swiftc-compatible flags like "-framework CoreFoundation".

# ============================================================================
# External Dependencies
# ============================================================================

find_package(ZLIB REQUIRED)
find_package(Protobuf REQUIRED) # Needed for protoc executable and include dirs
find_package(PkgConfig REQUIRED)
set(KATAGOCOREML_ABSEIL_MODULES
absl_base absl_log absl_log_internal_check_op absl_log_internal_message
absl_hash absl_strings absl_status absl_statusor
)
pkg_check_modules(KATAGOCOREML_ABSEIL REQUIRED ${KATAGOCOREML_ABSEIL_MODULES})

# Export link flags to parent scope for the final executable
pkg_check_modules(KATAGOCOREML_ALL_DEPS REQUIRED protobuf ${KATAGOCOREML_ABSEIL_MODULES})
set(KATAGOCOREML_DEP_LDFLAGS ${KATAGOCOREML_ALL_DEPS_LDFLAGS} PARENT_SCOPE)

# ============================================================================
# Proto Files (compile from source)
# ============================================================================

set(COREMLTOOLS_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/vendor")
set(PROTO_DIR "${COREMLTOOLS_ROOT}/mlmodel/format")
set(PROTO_GENERATED_DIR "${CMAKE_CURRENT_BINARY_DIR}/proto")
file(MAKE_DIRECTORY ${PROTO_GENERATED_DIR})

# Get all proto files (GLOB does not re-run on file additions; re-run cmake
# manually if proto files are added or removed).
file(GLOB PROTO_FILES "${PROTO_DIR}/*.proto")

# Generate C++ from all proto files
set(PROTO_SRCS)
set(PROTO_HDRS)

foreach(PROTO_FILE ${PROTO_FILES})
get_filename_component(PROTO_NAME ${PROTO_FILE} NAME_WE)
set(PROTO_SRC "${PROTO_GENERATED_DIR}/${PROTO_NAME}.pb.cc")
set(PROTO_HDR "${PROTO_GENERATED_DIR}/${PROTO_NAME}.pb.h")
list(APPEND PROTO_SRCS ${PROTO_SRC})
list(APPEND PROTO_HDRS ${PROTO_HDR})

add_custom_command(
OUTPUT ${PROTO_SRC} ${PROTO_HDR}
COMMAND ${Protobuf_PROTOC_EXECUTABLE}
ARGS --cpp_out=${PROTO_GENERATED_DIR}
-I${PROTO_DIR}
${PROTO_FILE}
DEPENDS ${PROTO_FILE}
COMMENT "Generating C++ from ${PROTO_NAME}.proto"
VERBATIM
)
endforeach()

# ============================================================================
# MILBlob Sources (vendored from coremltools)
# ============================================================================

set(MILBLOB_DIR "${COREMLTOOLS_ROOT}/mlmodel/src/MILBlob")
set(MILBLOB_SRCS
"${MILBLOB_DIR}/Blob/FileWriter.cpp"
"${MILBLOB_DIR}/Blob/StorageWriter.cpp"
"${MILBLOB_DIR}/Blob/StorageReader.cpp"
"${MILBLOB_DIR}/Blob/MMapFileReader.cpp"
"${MILBLOB_DIR}/Blob/MMapFileReaderFactory.cpp"
"${MILBLOB_DIR}/SubByteTypes.cpp"
"${MILBLOB_DIR}/Fp8.cpp"
"${MILBLOB_DIR}/Fp16.cpp"
)

# ============================================================================
# ModelPackage Sources (vendored from coremltools)
# ============================================================================

set(MODELPACKAGE_DIR "${COREMLTOOLS_ROOT}/modelpackage/src")
set(MODELPACKAGE_SRCS
"${MODELPACKAGE_DIR}/ModelPackage.cpp"
"${MODELPACKAGE_DIR}/utils/JsonMap.cpp"
)

# ============================================================================
# KataGoCoreML Library Sources
# ============================================================================

set(KATAGOCOREML_SRCS
src/parser/KataGoParser.cpp
src/builder/MILBuilder.cpp
src/builder/Operations.cpp
src/serializer/CoreMLSerializer.cpp
src/serializer/WeightSerializer.cpp
src/Converter.cpp
)

# ============================================================================
# Library Target
# ============================================================================

add_library(katagocoreml STATIC
${KATAGOCOREML_SRCS}
${PROTO_SRCS}
${MILBLOB_SRCS}
${MODELPACKAGE_SRCS}
)

target_include_directories(katagocoreml
PUBLIC
${CMAKE_CURRENT_SOURCE_DIR}/include
PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}/src
${PROTO_GENERATED_DIR}
${MILBLOB_DIR}/..
${MODELPACKAGE_DIR}
${CMAKE_CURRENT_SOURCE_DIR}/../nlohmann_json
${COREMLTOOLS_ROOT}/deps/FP16/include
${Protobuf_INCLUDE_DIRS}
${KATAGOCOREML_ABSEIL_INCLUDE_DIRS}
)

# Only link ZLIB as a CMake target (no swiftc-incompatible flags).
# Protobuf and abseil are linked via pkg-config LDFLAGS in the parent.
target_link_libraries(katagocoreml PRIVATE ZLIB::ZLIB)

target_compile_definitions(katagocoreml PRIVATE APPLE_BUILD=1)
29 changes: 29 additions & 0 deletions cpp/external/katagocoreml/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
BSD 3-Clause License

Copyright (c) 2025, Chin-Chang Yang
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Loading
Loading