Skip to content

Temporary fix nightly crash for large l_pad failing test #3981

@kasturedeeksha

Description

@kasturedeeksha

Summary

Following test give accuracy error for u8:s8:f32

./benchdnn --conv --dt=u8:s8:f32  --skip-impl=ref,x64:gemm mb1ic64ih1iw33oc1oh1ow33kh1kw24ph0pw23n"l_pad_exceeds_ow_block"

The failure occurs because the output is compared against sve_jit_convolution, which itself is producing incorrect results.
When the same test is checked against the reference implementation, it passes.

./benchdnn --conv --skip-impl=ref,x64:gemm mb1ic64ih1iw33oc1oh1ow33kh1kw24ph0pw23n"l_pad_exceeds_ow_block"
[ 185][arg:17][0,1,0,23,0,0] fp:  0.f dt:       26 
[ 193][arg:17][0,1,0,24,0,0] fp:  0.f dt: 7.34673e-40 
[ 201][arg:17][0,1,0,25,0,0] fp:  0.f dt: 1.12341e-36 
[ 209][arg:17][0,1,0,26,0,0] fp:  0.f dt: 5.6567e-36 
[ 217][arg:17][0,1,0,27,0,0] fp:  0.f dt:-2.79598e-36 
[ 225][arg:17][0,1,0,28,0,0] fp:  0.f dt: 2.17188e-36 
[ 233][arg:17][0,1,0,29,0,0] fp:  0.f dt:-6.0177e-36 
[ 241][arg:17][0,1,0,30,0,0] fp:  0.f dt: 4.20049e-37 
[ 249][arg:17][0,1,0,31,0,0] fp:  0.f dt: 6.77732e-38 
@@@ [arg:17] check_zero_padding failed
[   0][DST][0:0:0:0] exp_f32:        6325 exp:        6325 got:        6549 diff:     224 rdiff:0.035415
[   1][DST][0:0:0:1] exp_f32:        6946 exp:        6946 got:       -1838 diff:    8784 rdiff: 1.26461
[   2][DST][0:0:0:2] exp_f32:       -6601 exp:       -6601 got:        2697 diff:    9298 rdiff: 1.40857
[   3][DST][0:0:0:3] exp_f32:        -527 exp:        -527 got:       17840 diff:   18367 rdiff:  34.852
[   4][DST][0:0:0:4] exp_f32:        -250 exp:        -250 got:      -37473 diff:   37223 rdiff: 148.892
[   5][DST][0:0:0:5] exp_f32:        -348 exp:        -348 got:      -20327 diff:   19979 rdiff: 57.4109
[   6][DST][0:0:0:6] exp_f32:       -3411 exp:       -3411 got:       -6210 diff:    2799 rdiff: 0.82058
[   7][DST][0:0:0:7] exp_f32:      -22606 exp:      -22606 got:      -40731 diff:   18125 rdiff:0.801778
[   8][DST][0:0:0:8] exp_f32:       -2119 exp:       -2119 got:        8621 diff:   10740 rdiff: 5.06843
[   9][DST][0:0:0:9] exp_f32:         453 exp:         453 got:         963 diff:     510 rdiff: 1.12583
[COMPARE_STATS][DST]: trh=0 err_max_diff:     nan err_max_rdiff:     nan all_max_diff:1.54623e+36 all_max_rdiff:3.45986e+31
0:FAILED (errors:24 total:33) (3 ms) __REPRO: --conv --skip-impl=ref,x64:gemm mb1ic64ih1iw33oc1oh1ow33kh1kw24ph0pw23nl_pad_exceeds_ow_block
============================================================
= Implementation statistics (--summary=no-impl to disable) =
============================================================
| jit:sve_256 : 1 (100%)                                   |
============================================================
===========================================================
= Failed cases summary (--summary=no-failures to disable) =
===========================================================
0:FAILED (errors:24 total:33) (3 ms) __REPRO: --conv --skip-impl=ref,x64:gemm mb1ic64ih1iw33oc1oh1ow33kh1kw24ph0pw23nl_pad_exceeds_ow_block
============================
tests:1 passed:0 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:1 listed:0
total: 0.00s; create_pd: 0.00s (15%); create_prim: 0.00s (10%); fill: 0.00s (51%); execute: 0.00s (2%); compute_ref: 0.00s (0%); compare: 0.00s (10%);

A temporary fix has been applied in PR #3911.
This issue is raised to track the underlying bug in sve_jit_convolution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions