BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//cfp.riscv-europe.org//eu-summit-2026//talk//WKS77D
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-eu-summit-2026-WKS77D@cfp.riscv-europe.org
DTSTART;TZID=CET:20260611T112000
DTEND;TZID=CET:20260611T113000
DESCRIPTION:Language models are becoming increasingly common\, and their nu
 mber of parameters is continuously increasing\, imposing huge memory capac
 ities. One of the most common techniques to reduce their memory footprint 
 is weight quantization. Ternary models are one of the most extreme cases o
 f quantization. So far\, most hardware proposals focus on FPGA-based accel
 erators to optimize inference in quantized models\, while current general-
 purpose processors have limited support (up to 8-bit integers). In this wo
 rk we attempt a preliminary analysis of the potential benefits of moving t
 he quantization hardware support directly to the processor. To do so\, we 
 make use of a state-of-the-art inference framework for CPUs and Small Lang
 uage Models\, evaluating what the competitive advantages of having dedicat
 ed SIMD hardware for quantized operations. The results show a speedup x2 (
 tokens/s) on a 350MB Small Language Model with a tendency to increase the 
 speedup with the model size\, using a minimal increase of the hardware res
 ources (1.25% in LUTs).
DTSTAMP:20260522T163210Z
LOCATION:Poster Island C
SUMMARY:Hardware support in RISC-V for ternary LLMs - David Aledo
URL:https://cfp.riscv-europe.org/eu-summit-2026/talk/WKS77D/
END:VEVENT
END:VCALENDAR
