BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//cfp.riscv-europe.org//eu-summit-2026//speaker//FSJUJT
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-eu-summit-2026-QB3TNY@cfp.riscv-europe.org
DTSTART;TZID=CET:20260609T130000
DTEND;TZID=CET:20260609T131000
DESCRIPTION:Matrix multiplication (GEMM) sits at the heart of scientific co
 mputing\, data analytics\, and modern AI workloads. While much attention i
 s given to peak throughput and ideal matrix sizes\, real-world performance
  often hinges on the “edges” i.e.\, non-ideal dimensions\, cache bound
 aries\, and vector tail cases that quietly dominate execution time. In thi
 s paper\, we present a practical case study of optimizing GEMM in OpenBLAS
  for RISC-V vector architectures. We show how careful handling of edge con
 ditions\, cache reuse\, and vectorization strategy can deliver measurable 
 performance gains. Techniques include maximizing cache and register reuse 
 with single-pass data traversal\, swapping operands and deferring transpos
 ition for easier storage\, combining full- and half-vector operations with
  scalar instructions to efficiently handle irregular dimensions\, and leve
 raging strided segmented load/store vector intrinsics to sustain throughpu
 t even in non-ideal layouts. These optimizations are not just academic\; s
 mall inefficiencies in GEMM propagate directly into AI inference latency a
 nd energy. By focusing on edge cases and architectural nuance\, we can unl
 ock meaningful improvements for real-world workloads. These optimizations 
 give substantial gains\; for example\, a 6 x 3072 × 3072 SGEMM MatMul eff
 iciency improves from 23.5% to 68.7% of the peak.
DTSTAMP:20260522T162352Z
LOCATION:Poster Island C
SUMMARY:Why Edges Matter: A Case Study on Performance Improvements for Open
 BLAS GEMM on RISC-V - Chip Kerchner\, Rama Malladi
URL:https://cfp.riscv-europe.org/eu-summit-2026/talk/QB3TNY/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-eu-summit-2026-VPNYEP@cfp.riscv-europe.org
DTSTART;TZID=CET:20260609T141000
DTEND;TZID=CET:20260609T142000
DESCRIPTION:Small Language Models (SLMs) are increasingly critical for edge
  AI\, yet their performance on RISC-V requires rigorous profiling to ident
 ify architectural bottlenecks. This work evaluates the performance of SLMs
  including Gemma3\, Llama-3.2\, Qwen-2.5\, DeepSeek\, and Phi-3.5 on the T
 enstorrent Ascalon RISC-V Core. We developed a profiling methodology to an
 alyze workload distribution\, which revealed that Matrix Multiplication (M
 atMul) contributes ~90% of total compute across all evaluated models. Give
 n the computational complexity of running full-model emulations\, we extra
 ct these critical kernels for targeted benchmarking. Our implementation on
  the HAPS platform achieves significant performance leaps over standard ba
 selines. FP32 execution\, utilized for maximum precision\, was optimized b
 y transitioning from traditional SGEMM to a new high-performance implement
 ation. Simultaneously\, INT8 performance\, targeted for efficient inferenc
 e\, was accelerated by migrating from standard RVV to a specialized IGEMM 
 (with a VQDOT) implementation.
DTSTAMP:20260522T162352Z
LOCATION:Poster Island C
SUMMARY:From Profiling to Performance: Optimizing Small  Language Models on
  RISC‑V Architectures - Dongjie Xie\, Rama Malladi\, Jose Arnau\, Chip K
 erchner
URL:https://cfp.riscv-europe.org/eu-summit-2026/talk/VPNYEP/
END:VEVENT
END:VCALENDAR
