Comment:
Hi maintainers!
There was a cool feature added to llama.cpp where we can have dynamic dispatch to various backends which are compiled for the various microarchitecture levels for x86_64. This can enabled great performance benefits! It's basically enabling GGML_BACKEND_DL and GGML_CPU_ALL_VARIANTS.
I created an issue for llama-cpp-python to add some missing bindings to really make use of this, see here abetlen/llama-cpp-python#2069
When this has been taken care of and llama-cpp-python supports this, I think it would be really great if this feedstock would enable this for the x86_64 CPU builds.