Possible Metal backend regression on Intel macOS: v1.16.4 plays grossly incorrectly with b18 weights, while OpenCL behaves normally

  ## Description

  I may have found a Metal-backend-specific regression on Intel macOS.

  Initially I was not sure whether this was caused by the Homebrew build/packaging specifically, or by KataGo itself on this backend/platform combination. However, after rebuilding the same Homebrew formula from source and reproducing the same issue, I now suspect a KataGo backend/runtime issue much more than a Homebrew packaging issue.

  On the same Intel Mac, using the same model and broadly the same config:

  - a Homebrew-installed KataGo `v1.16.4` binary plays grossly incorrectly
  - rebuilding the Homebrew formula from source still shows the same issue
  - an OpenCL KataGo binary bundled with KaTrain behaves normally
  - the OpenCL binary is also about 3x faster than the failing binary on this machine

  This makes me suspect a Metal backend issue on Intel Mac, rather than a GUI integration problem.

  At the same time, this does not seem to affect every model equally on the same build. The problem appears most severe with the `b18` model, while smaller nets such as `b6` and other tested weights such as `b40c256` appear much more normal with the same Homebrew binary.

  ## What I observed

  With the failing binary, I saw behavior such as:

  - very low-level blunders
  - a move gets captured for no good reason
  - entire games where the bot appeared far weaker than expected
  - this happened with a strong `b18` net in positions where I would not expect such mistakes even at modest visits

  I tested with multiple GUI programs, including Sabaki, and saw the same kind of bad behavior.

  ## Why I think this is not just randomness / low visits

  I spent time checking this possibility first.

  I confirmed separately that low visits can cause large instability in general, but this case looks different because:

  - the bad behavior was much more extreme on this macOS binary
  - rebuilding the Homebrew formula from source did not change the behavior
  - the same `b18` model behaved normally with a different KataGo binary on the same machine
  - smaller or other tested weights such as `b6` and `b40c256` appeared much more normal with the same Homebrew binary
  - replacing the failing binary with another KataGo binary on the same Mac restored normal play

  ## Comparison

  ### Failing setup

  - KataGo binary: Homebrew-installed `v1.16.4`
    katago version reports below

    ```
    katago version
    KataGo v1.16.4
    Git revision: <omitted>
    Compile Time: Oct 20 2025 13:40:11
    Using Metal backend
    ```

    

  - Machine: Intel Macbook Pro, 13-inch, 2020, Four Thunderbolt 3 ports

  - OS: macOS `15.7.1`

  - Backend: Metal, as reported by the `katago version` output above.
    When rebuilding from the Homebrew formula, I also saw the following build command:

    ```sh
    cmake -S cpp -B build -DNO_GIT_REVISION=1 -DUSE_BACKEND=METAL -GNinja
    ```

  - Model: kata1-b18c384nbt-s9996604416-d4316597426.bin.gz

  - Config: custom `gtp_example.cfg` derived from KataGo example config

  - I also rebuilt the Homebrew formula from source locally and saw the same issue.
    The Homebrew formula appears to be a very thin build recipe using upstream `v1.16.4` source and `-DUSE_BACKEND=METAL`, so at this point a pure Homebrew packaging issue seems less likely than a backend/runtime issue.

  ### Working setup

  - KataGo binary: OpenCL binary bundled with KaTrain
  - Same machine
  - Same model
  - Same general usage / same kind of positions
  - Performance returned to normal
  - Speed was also about 3x faster than the failing binary

  ## Additional notes

  - This does not look like a GUI-specific issue, because I reproduced the problem across multiple GUI programs, including Sabaki.
  - On the same Homebrew binary, other tested weights such as `b6c96` and `b40c256` appeared much more normal than the failing `b18` weight.
  - This does not look like a generic “Intel Mac is too slow” issue, because the OpenCL binary on the same machine is both faster and plays correctly.

  ## Config in use

  The active settings in my config included roughly:

  - `rules = tromp-taylor`
  - `allowResignation = false`
  - `dynamicPlayoutDoublingAdvantageCapPerOppLead = 0.045`
  - `maxVisits = 1600` (context, I can only get 100-200 visits within 30 seconds, I also test with fixed 50 visits)
  - `maxTime = 30`
  - `ponderingEnabled = false`
  - `lagBuffer = 1.0`
  - `numSearchThreads = 4`
  - `searchFactorAfterOnePass = 0.50`
  - `searchFactorAfterTwoPass = 0.25`
  - `searchFactorWhenWinning = 0.40`
  - `searchFactorWhenWinningThreshold = 0.95`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible Metal backend regression on Intel macOS: v1.16.4 plays grossly incorrectly with b18 weights, while OpenCL behaves normally #1175

Description

What I observed

Why I think this is not just randomness / low visits

Comparison

Failing setup

Working setup

Additional notes

Config in use

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Possible Metal backend regression on Intel macOS: v1.16.4 plays grossly incorrectly with b18 weights, while OpenCL behaves normally #1175

Description

Description

What I observed

Why I think this is not just randomness / low visits

Comparison

Failing setup

Working setup

Additional notes

Config in use

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions