unroll n-tile x16 for gemm x86 avx512 #6434

nihui · 2025-12-01T08:37:47Z

opt 8x16
opt 4x16

tencent-adm · 2025-12-01T08:38:05Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov-commenter · 2025-12-01T08:50:52Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.90%. Comparing base (82af4a1) to head (6b709bf).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6434      +/-   ##
==========================================
- Coverage   95.91%   95.90%   -0.02%     
==========================================
  Files         844      844              
  Lines      267021   265852    -1169     
==========================================
- Hits       256122   254955    -1167     
+ Misses      10899    10897       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-12-01T09:08:57Z

The binary size change of libncnn.so (bytes)

architecture	base size	pr size	difference
x86_64	15328640	15316352	-12288 😘
armhf	6229904	6229904	0 😘
aarch64	9527568	9527568	0 😘

nihui · 2025-12-01T09:37:56Z

r9-9950x single thread

$ ./benchllm-0 100 1 2 -1 0 233
loop_count = 100
num_threads = 1
powersave = 2
gpu_device = -1
cooling_down = 0
seqlen = 233
            minicpm4 (prefill)  min =  671.96  max =  678.33  avg =  673.73
            minicpm4  (decode)  min =   35.03  max =   36.12  avg =   35.23


$ ./benchllm 100 1 2 -1 0 233
loop_count = 100
num_threads = 1
powersave = 2
gpu_device = -1
cooling_down = 0
seqlen = 233
            minicpm4 (prefill)  min =  638.59  max =  646.15  avg =  640.96
            minicpm4  (decode)  min =   32.80  max =   34.75  avg =   33.17

…emm-5

unroll n-tile x16 for gemm avx512

c608a41

github-actions bot added the x86 label Dec 1, 2025

apply code-format changes

e60ea74

nihui closed this Dec 1, 2025

nihui reopened this Dec 1, 2025

nihui added 3 commits December 1, 2025 19:46

cc

75d1441

Merge branch 'opt-x86-gemm-5' of github.com:nihui/ncnn into opt-x86-g…

3d98511

…emm-5

test++

6b709bf

github-actions bot added the test label Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

unroll n-tile x16 for gemm x86 avx512 #6434

unroll n-tile x16 for gemm x86 avx512 #6434

nihui commented Dec 1, 2025 •

edited

Loading

Uh oh!

tencent-adm commented Dec 1, 2025

Uh oh!

codecov-commenter commented Dec 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 1, 2025 •

edited

Loading

Uh oh!

nihui commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

unroll n-tile x16 for gemm x86 avx512 #6434

Are you sure you want to change the base?

unroll n-tile x16 for gemm x86 avx512 #6434

Conversation

nihui commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tencent-adm commented Dec 1, 2025

Uh oh!

codecov-commenter commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nihui commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nihui commented Dec 1, 2025 •

edited

Loading

codecov-commenter commented Dec 1, 2025 •

edited

Loading

github-actions bot commented Dec 1, 2025 •

edited

Loading