BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//cfp.riscv-europe.org//eu-summit-2026//speaker//CB8LYV
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-eu-summit-2026-VPNYEP@cfp.riscv-europe.org
DTSTART;TZID=CET:20260609T141000
DTEND;TZID=CET:20260609T142000
DESCRIPTION:Small Language Models (SLMs) are increasingly critical for edge
  AI\, yet their performance on RISC-V requires rigorous profiling to ident
 ify architectural bottlenecks. This work evaluates the performance of SLMs
  including Gemma3\, Llama-3.2\, Qwen-2.5\, DeepSeek\, and Phi-3.5 on the T
 enstorrent Ascalon RISC-V Core. We developed a profiling methodology to an
 alyze workload distribution\, which revealed that Matrix Multiplication (M
 atMul) contributes ~90% of total compute across all evaluated models. Give
 n the computational complexity of running full-model emulations\, we extra
 ct these critical kernels for targeted benchmarking. Our implementation on
  the HAPS platform achieves significant performance leaps over standard ba
 selines. FP32 execution\, utilized for maximum precision\, was optimized b
 y transitioning from traditional SGEMM to a new high-performance implement
 ation. Simultaneously\, INT8 performance\, targeted for efficient inferenc
 e\, was accelerated by migrating from standard RVV to a specialized IGEMM 
 (with a VQDOT) implementation.
DTSTAMP:20260522T162444Z
LOCATION:Poster Island C
SUMMARY:From Profiling to Performance: Optimizing Small  Language Models on
  RISC‑V Architectures - Dongjie Xie\, Rama Malladi\, Jose Arnau\, Chip K
 erchner
URL:https://cfp.riscv-europe.org/eu-summit-2026/talk/VPNYEP/
END:VEVENT
END:VCALENDAR