AME-PIM: Breaking the Memory Wall with RISC-V Matrix Extensions and HBM-PIM
2026-06-11 , Poster Island B

Matrix workloads, essential in generative AI, increasingly rely on ISA-level (i.e. AMX, SME). The attached matrix extension (AME) is one of the three (IME, AME, VME) ISA extensions under standardization in RISC-V. In common, all these matrix-ISA assumes extensions of the processor datapath with dedicated matrix acceleration hardware. However, executing matrix kernels requires moving large tiles between memory and processor registers, making performance limited by memory bandwidth.
We investigate whether High Bandwidth Memory with Processing-in-Memory (HBM--PIM) can serve as alternative implementation of AME instructions. We propose a PIM Execution Primitive (PEP) computational model mapping AME ISA onto Samsung Aquabolt-XL HBM-PIM microkernels, using an outer-product dataflow to enable in-memory accumulation, as well as remapping AME tile registers into memory regions—making possible to chain AME instructions without leaving the memory.
Our experiments show AME tile multiplication reaching 14.9 GFLOP/s (59.4 FLOP/cycle) on a HBM--PIM pseudo-channel, demonstrating that HBM--PIM can serve as an implementation of RISC-V matrix extensions.