BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//cfp.riscv-europe.org//eu-summit-2026//talk//FP7AUZ
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-eu-summit-2026-FP7AUZ@cfp.riscv-europe.org
DTSTART;TZID=CET:20260611T111000
DTEND;TZID=CET:20260611T112000
DESCRIPTION:Matrix workloads\, essential in generative AI\, increasingly re
 ly on ISA-level (i.e. AMX\, SME). The attached matrix extension (AME) is o
 ne of the three (IME\, AME\, VME) ISA extensions  under standardization in
  RISC-V. In common\, all these matrix-ISA assumes extensions of the proces
 sor datapath with dedicated matrix acceleration hardware. However\, execut
 ing matrix kernels requires moving large tiles between memory and processo
 r registers\, making performance limited by memory bandwidth.\nWe investig
 ate whether High Bandwidth Memory with Processing-in-Memory (HBM--PIM) can
  serve as alternative implementation of AME instructions. We propose a PIM
  Execution Primitive (PEP) computational model mapping AME ISA onto Samsun
 g Aquabolt-XL HBM-PIM microkernels\, using an outer-product dataflow to en
 able in-memory accumulation\, as well as remapping AME tile registers into
  memory regions—making possible to chain AME instructions without leavin
 g the memory.\nOur experiments show AME tile multiplication reaching 14.9 
 GFLOP/s (59.4 FLOP/cycle) on a HBM--PIM pseudo-channel\, demonstrating tha
 t HBM--PIM can serve as an implementation of RISC-V matrix extensions.
DTSTAMP:20260522T163242Z
LOCATION:Poster Island B
SUMMARY:AME-PIM: Breaking the Memory Wall with RISC-V Matrix Extensions and
  HBM-PIM - Emanuele Venieri
URL:https://cfp.riscv-europe.org/eu-summit-2026/talk/FP7AUZ/
END:VEVENT
END:VCALENDAR
