BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//cfp.riscv-europe.org//eu-summit-2026//speaker//AMPMN3
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-eu-summit-2026-XTDV7A@cfp.riscv-europe.org
DTSTART;TZID=CET:20260609T132000
DTEND;TZID=CET:20260609T133000
DESCRIPTION:The fragmented RISC-V ecosystem demands portable\, high-perform
 ance code generation for the Vector Extension (RVV 1.0). Upstream MLIR (LL
 VM 22.0) lacks two critical lowering stages needed for this: it cannot fla
 tten dynamic memref ma- trix references into C pointers\, nor emit Vector-
 Length-Agnostic (VLA) RVV intrinsics. This paper closes that gap with a si
 x-stage hybrid MLIR–xDSL compilation workflow that automatically generat
 es parameterized\, hardware-aware C micro- kernels for GEMM entirely in Py
 thon\, without modifying the MLIR C++ codebase. On a COTS BananaPi F3 boar
 d (SpaceMiT K1\, 256-bit RVV 1.0)\, we show: (i) isolated micro-kernels ma
 tch or exceed hand-written reference code (0.98×– 1.05×)\, peaking at 
 16.2 GFLOPS at the optimal 16×15 tile\; (ii) on BERT-Large transformer la
 yers (B1–B5)\, generated micro-kernels consistently surpass OpenBLAS\, r
 eaching up to 12.2 GFLOPS against the baseline’s 5.1 GFLOPS (a 2.4× spe
 edup) and maintaining an average 15–27% performance advantage across all
  layer dimensions.
DTSTAMP:20260522T162348Z
LOCATION:Poster Island C
SUMMARY:RISC-V Vector 1.0 code Generation in MLIR-xDSL - Jie Lei
URL:https://cfp.riscv-europe.org/eu-summit-2026/talk/XTDV7A/
END:VEVENT
END:VCALENDAR
