Skip to content

[NPU]: support kl div on NPU #1001

Merged
Tcc0403 merged 1 commit intolinkedin:mainfrom
noemotiovon:op_kl_div
Jan 12, 2026
Merged

[NPU]: support kl div on NPU #1001
Tcc0403 merged 1 commit intolinkedin:mainfrom
noemotiovon:op_kl_div

Conversation

@noemotiovon
Copy link
Copy Markdown
Contributor

Add NPU device support for KL divergence operation.
Set MAX_FUSED_SIZE to 8192 for NPU, same as XPU.

  • Hardware Type: Ascend910B4
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

Add NPU device support for KL divergence operation.
Set MAX_FUSED_SIZE to 8192 for NPU, same as XPU.
@noemotiovon
Copy link
Copy Markdown
Contributor Author

Test Result:

Coverage HTML written to dir htmlcov
====================================================================== slowest durations =======================================================================
9.39s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-batchmean-True-1-4096-32000]
0.44s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-batchmean-False-41-401-1271]
0.06s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-batchmean-True-1-4096-32000]
0.06s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-mean-True-1-4096-32000]
0.06s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-mean-False-1-4096-32000]
0.06s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-batchmean-False-1-4096-32000]
0.06s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-batchmean-True-1-4096-32000]
0.06s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-batchmean-False-1-4096-32000]
0.06s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-mean-False-1-4096-32000]
0.06s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-mean-True-1-4096-32000]
0.06s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-batchmean-False-1-4096-32000]
0.06s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-batchmean-True-1-4096-32000]
0.06s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-mean-True-1-4096-32000]
0.06s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-mean-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-sum-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-batchmean-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-mean-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-mean-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-sum-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-sum-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-sum-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-sum-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-sum-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-sum-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-sum-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-batchmean-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-batchmean-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-mean-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-mean-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-batchmean-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-batchmean-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-mean-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-mean-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-sum-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-sum-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-sum-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-sum-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-none-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-none-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-none-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-none-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-none-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-none-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-none-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-none-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-none-True-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-none-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-none-False-1-4096-32000]
0.05s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-none-False-1-4096-32000]
0.04s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-batchmean-False-1-4096-32000]
0.03s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-mean-True-1-4096-32000]
0.03s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-batchmean-True-1-4096-32000]
0.03s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-batchmean-False-1-4096-32000]
0.03s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-mean-False-1-4096-32000]
0.03s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-none-False-1-4096-32000]
0.03s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-none-True-1-4096-32000]
0.03s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-none-True-1-4096-32000]
0.03s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-mean-True-1-4096-32000]
0.03s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-none-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-batchmean-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-sum-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-mean-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-sum-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-mean-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-sum-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-sum-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-mean-True-1-4096-32000]
0.02s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-mean-True-41-401-1271]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-sum-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-mean-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-batchmean-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-batchmean-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-batchmean-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-mean-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-batchmean-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-mean-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-none-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-none-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-none-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-none-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-batchmean-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-batchmean-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-sum-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-mean-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-mean-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-sum-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-none-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-sum-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-none-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-batchmean-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-none-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-sum-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-none-False-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-mean-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-sum-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-batchmean-False-1-4096-32000]
0.02s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-batchmean-True-41-401-1271]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-sum-True-1-4096-32000]
0.02s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-sum-True-1-4096-32000]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-batchmean-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-batchmean-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-batchmean-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-batchmean-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-mean-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-mean-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-batchmean-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-batchmean-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-batchmean-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-mean-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-sum-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-mean-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-sum-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-sum-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-mean-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-sum-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-mean-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-sum-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-mean-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-mean-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-batchmean-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-batchmean-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-sum-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-sum-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-mean-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-sum-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-sum-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-batchmean-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-mean-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-sum-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-sum-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-sum-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-mean-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-none-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype0-1e-08-0.05-none-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-none-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-none-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-none-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-none-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-none-False-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-none-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-none-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness[dtype2-0.001-0.001-none-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-none-True-41-401-1271]
0.01s call     test/transformers/test_kl_div.py::test_correctness_not_last[dtype2-0.001-0.001-none-False-41-401-1271]
0.01s setup    test/transformers/test_kl_div.py::test_correctness[dtype0-1e-08-0.05-sum-False-1-4096-32000]
0.01s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-mean-True-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-batchmean-False-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-batchmean-True-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-mean-False-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-none-False-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-sum-True-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-mean-True-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-batchmean-False-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-sum-False-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-none-True-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-none-False-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-none-True-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-sum-False-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-mean-False-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness[dtype1-1e-08-1e-06-batchmean-True-41-401-1271]
0.01s teardown test/transformers/test_kl_div.py::test_correctness_not_last[dtype1-1e-08-1e-06-sum-True-41-401-1271]

(127 durations < 0.005s hidden.  Use -vv to show these durations.)
=============================================================== 96 passed, 24 warnings in 25.54s ===============================================================

Copy link
Copy Markdown
Collaborator

@Tcc0403 Tcc0403 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@Tcc0403 Tcc0403 merged commit 149a5fd into linkedin:main Jan 12, 2026
3 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants