Hardware support in RISC-V for ternary LLMs RISC-V Summit Europe 2026

Hardware support in RISC-V for ternary LLMs
.ical
2026-06-11 11:20–11:30, Poster Island C

Language models are becoming increasingly common, and their number of parameters is continuously increasing, imposing huge memory capacities. One of the most common techniques to reduce their memory footprint is weight quantization. Ternary models are one of the most extreme cases of quantization. So far, most hardware proposals focus on FPGA-based accelerators to optimize inference in quantized models, while current general-purpose processors have limited support (up to 8-bit integers). In this work we attempt a preliminary analysis of the potential benefits of moving the quantization hardware support directly to the processor. To do so, we make use of a state-of-the-art inference framework for CPUs and Small Language Models, evaluating what the competitive advantages of having dedicated SIMD hardware for quantized operations. The results show a speedup x2 (tokens/s) on a 350MB Small Language Model with a tendency to increase the speedup with the model size, using a minimal increase of the hardware resources (1.25% in LUTs).

See also:

David Aledo

David Aledo is an Assistant Professor at Universidad de Cantabria, Spain. He has a PhD on Electrical and Electronic Engineering, and was a PostDoc Researcher at the TU Deflt, Netherlands. His research lines are around hardware accelerators for machine learning, HLS (High-Level Synthesis), FPGAs, and system-level simulations.

Hardware support in RISC-V for ternary LLMs .ical 2026-06-11 11:20–11:30, Poster Island C

Hardware support in RISC-V for ternary LLMs
.ical
2026-06-11 11:20–11:30, Poster Island C