The case there are only 128-bit width memset. The code generation of scalar version looks like better than RVV.
#include <string.h>
void foo(long *ptr) {
memset(ptr, 0, 16);
}
-O3 -march=rv64gc
foo:
sd zero, 0(a0)
sd zero, 8(a0)
ret
-O3 -march=rv64gcv
foo:
vsetivli zero, 2, e64, m1, ta, ma
vmv.v.i v8, 0
vse64.v v8, (a0)
ret
https://godbolt.org/z/er7drbEcb
Should we define a size threshold to determine when memset should be translated to RVV instructions in getOptimalMemOpType?