2026-06-11 –, Poster Island B
This work presents the design of a tightly coupled near-memory computing unit compatible with a preliminary RISC-V Attached Matrix Extension. The proposed unit is designed to be integrated with a processor core through the Core-V eXtension Interface (CV-X-IF), enabling matrix operations to be decoded and executed directly in a processing unit attached to a system's main memory. Instead of moving data into registers prior to computation, load instructions specify operand locations in main memory. Memory access and near-memory computation are deferred until the execution unit requires the operands. To evaluate the feasibility of the proposed architecture, a model of the unit is designed, implemented, and validated in the gem5 architectural simulator. This model serves as a first step to prove the concept and enables design-space exploration of the architecture. As a preliminary evaluation, a quantized convolutional neural network workload is executed on the simulator to assess the potential performance benefits of the approach, achieving a 47x speed-up with respect to a simulated processor baseline.
See Abstract.