-
Notifications
You must be signed in to change notification settings - Fork 794
Open
Labels
Description
Describe the bug
https://github.com/intel/llvm/actions/runs/19151375733/job/54743282591?pr=20542
SYCL :: Matrix/joint_matrix_bfloat16_accumulator.cpp
SYCL :: Matrix/joint_matrix_half_accumulator.cpp
********************
FAIL: SYCL :: Matrix/joint_matrix_bfloat16_accumulator.cpp (1688 of 1911)
******************** TEST 'SYCL :: Matrix/joint_matrix_bfloat16_accumulator.cpp' FAILED ********************
Exit Code: -6
Command Output (stdout):
--
# RUN: at line 21
env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
# executed command: env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
# .---command stdout------------
# | B row major:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | B packed:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# `-----------------------------
# RUN: at line 21
env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
# executed command: env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
# .---command stdout------------
# | B row major:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | B packed:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# `-----------------------------
# RUN: at line 22
env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
# executed command: env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
# .---command stdout------------
# | B row major:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | B packed:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# `-----------------------------
# RUN: at line 22
env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
# executed command: env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
# .---command stdout------------
# | B row major:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | B packed:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# `-----------------------------
# RUN: at line 23
env IGC_JointMatrixLoadStoreOpt=1 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
# executed command: env IGC_JointMatrixLoadStoreOpt=1 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_bfloat16_accumulator.cpp.tmp.out
# .---command stdout------------
# | B row major:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# `-----------------------------
# .---command stderr------------
# | terminate called after throwing an instance of 'sycl::_V1::exception'
# | what(): level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST)
# `-----------------------------
# error: command failed with exit status: -6
--
********************
FAIL: SYCL :: Matrix/joint_matrix_half_accumulator.cpp (1761 of 1911)
******************** TEST 'SYCL :: Matrix/joint_matrix_half_accumulator.cpp' FAILED ********************
Exit Code: -6
Command Output (stdout):
--
# RUN: at line 22
env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
# executed command: env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
# .---command stdout------------
# | B row major:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | B packed:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# `-----------------------------
# RUN: at line 22
env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
# executed command: env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
# .---command stdout------------
# | B row major:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | B packed:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# `-----------------------------
# RUN: at line 23
env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
# executed command: env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
# .---command stdout------------
# | B row major:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | B packed:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# `-----------------------------
# RUN: at line 23
env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
# executed command: env IGC_JointMatrixLoadStoreOpt=2 env env UR_LOADER_USE_LEVEL_ZERO_V2=1 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
# .---command stdout------------
# | B row major:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | B packed:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# `-----------------------------
# RUN: at line 24
env IGC_JointMatrixLoadStoreOpt=1 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
# executed command: env IGC_JointMatrixLoadStoreOpt=1 env env UR_LOADER_USE_LEVEL_ZERO_V2=0 ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Matrix/Output/joint_matrix_half_accumulator.cpp.tmp.out
# .---command stdout------------
# | B row major:
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 8 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 16 x 16 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 16 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 1 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 16 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# | Testing: 32 x 64 x 32 [TM x TN x TK]
# `-----------------------------
# .---command stderr------------
# | terminate called after throwing an instance of 'sycl::_V1::exception'
# | what(): level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST)
# `-----------------------------
# error: command failed with exit status: -6
--
To reproduce
- Include a code snippet that is as short as possible
- Specify the command which should be used to compile the program
- Specify the command which should be used to launch the program
- Indicate what is wrong and what was expected
Environment
- OS: [e.g Windows/Linux]
- Target device and vendor: [e.g. Intel GPU]
- DPC++ version: [e.g. commit hash or output of
clang++ --version] - Dependencies version: [e.g. the output of
sycl-ls --verbose]
Additional context
No response