Yueh-Feng Lee
Dr. Yueh-Feng Lee received his Ph.D. degree in computer science from National Chiao Tung University. He previously worked at Mediatek and Industrial Technology Research Institute. His areas of focus include AI compiler and runtime, hypervisor technology, and embedded systems.
Session
In this work, we optimize LLM inference on edge RISC-V CPUs using vector extension instructions. We leverage 4-bit vector load and efficient 8-bit dot-product instructions to accelerate quantized and repacked 4-bit kernels in llama.cpp. In addition, we implement RVV support for tiled flash attention, which further improves performance in the prefill stage. Experimental results show that the proposed optimizations achieve 1.76x-2.14x speedup over the upstream implementation while maintaining near-linear scaling for prefill workloads on an RVV-enabled multi-core platform.