BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//cfp.riscv-europe.org//eu-summit-2026//talk//QB3TNY
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-eu-summit-2026-QB3TNY@cfp.riscv-europe.org
DTSTART;TZID=CET:20260609T130000
DTEND;TZID=CET:20260609T131000
DESCRIPTION:Matrix multiplication (GEMM) sits at the heart of scientific co
 mputing\, data analytics\, and modern AI workloads. While much attention i
 s given to peak throughput and ideal matrix sizes\, real-world performance
  often hinges on the “edges” i.e.\, non-ideal dimensions\, cache bound
 aries\, and vector tail cases that quietly dominate execution time. In thi
 s paper\, we present a practical case study of optimizing GEMM in OpenBLAS
  for RISC-V vector architectures. We show how careful handling of edge con
 ditions\, cache reuse\, and vectorization strategy can deliver measurable 
 performance gains. Techniques include maximizing cache and register reuse 
 with single-pass data traversal\, swapping operands and deferring transpos
 ition for easier storage\, combining full- and half-vector operations with
  scalar instructions to efficiently handle irregular dimensions\, and leve
 raging strided segmented load/store vector intrinsics to sustain throughpu
 t even in non-ideal layouts. These optimizations are not just academic\; s
 mall inefficiencies in GEMM propagate directly into AI inference latency a
 nd energy. By focusing on edge cases and architectural nuance\, we can unl
 ock meaningful improvements for real-world workloads. These optimizations 
 give substantial gains\; for example\, a 6 x 3072 × 3072 SGEMM MatMul eff
 iciency improves from 23.5% to 68.7% of the peak.
DTSTAMP:20260522T163257Z
LOCATION:Poster Island C
SUMMARY:Why Edges Matter: A Case Study on Performance Improvements for Open
 BLAS GEMM on RISC-V - Chip Kerchner\, Rama Malladi
URL:https://cfp.riscv-europe.org/eu-summit-2026/talk/QB3TNY/
END:VEVENT
END:VCALENDAR
