Emanuele Venieri RISC-V Summit Europe 2026

Emanuele Venieri
.ical

I am a PhD student at the ECS Lab at University of Bologna, where I also earned my MSc in Electronics Engineering. My research focuses on digital architectures, with particular interest in RISC-V vector and matrix extensions and processing-in-memory (PIM) systems. I work on the Monte Cimone project, contributing to the enablement and characterization of the second-generation RISC-V cluster while evaluating the third iteration. I also contributed to AME-PIM, a novel approach that exposes PIM capabilities through the semantics of a matrix extension. In parallel, I work within the DARE project, where I contribute to the delivery of ControlPULP as the power-management controller for the GPP subsystem.

Sessions

06-10

11:00

10min

Monte Cimone v3: Where RISC-V Stands in High-Performance Computing

Emanuele Venieri

The Monte Cimone project provides a RISC-V testbed for High-Performacne Computing cluster. This paper presents Monte Cimone v3 (MCv3), the third iteration of the Monte Cimone RISC-V HPC cluster, integrating the SOPHGO Sophon SG2044 processor, an evolution of the SG2042 used in MCv2. We characterize MCv3 using HPL and STREAM benchmarks coupled with power measurements, and compare it against two reference platforms: the Intel Xeon Platinum 8480+ (Sapphire Rapids) and the NVIDIA Grace CPU Superchip. Our results show that the SG2044 more than doubles single-core performance and improves scalability compared to SG2042. MCv3 achieves an energy efficiency of 3.08GFLOPs/W which improves of 10x w.r.t. MCv1 and is in the range of x86-64 and Arm servers. On pure performance when normalized on the SIMD/Vector length MCv3 on its peak efficiency point (16 cores) achieves 46% performance of Intel Sapphire Rapids server and 91% performance of NVIDIA Grace CPU superchip.

Towards Open User-Space Power-Management Communication Interfaces

Antonio del Vecchio, Emanuele Venieri

Modern processors delegate power and thermal management to dedicated Power Control Systems (PCS), communicating through kernel-mediated interfaces such as SCMI or the emerging RPMI.
Prior work has shown that end-to-end control quality is dominated by the power-management policy rather than by interface latency, leaving room to choose communication paradigms based on flexibility rather than raw latency.
We integrate Micro XRCE-DDS on ControlPULP, a RISC-V–based PCS, connecting it to a user-space Agent on an ARM host via a custom shared-memory transport.
This design removes protocol logic from kernel drivers and naturally supports multi-controller coordination through a shared middleware layer. Experiments on a ZCU102 FPGA at 20 MHz show 490 μs of active processing per publication, 0.8 MB/s throughput, and a memory footprint under 11.2 KB for 32 topics. The resulting latency is comparable to SCMI [1] while enabling a more flexible communication model.

Blind Submission (Default)

AME-PIM: Breaking the Memory Wall with RISC-V Matrix Extensions and HBM-PIM

Emanuele Venieri

Matrix workloads, essential in generative AI, increasingly rely on ISA-level (i.e. AMX, SME). The attached matrix extension (AME) is one of the three (IME, AME, VME) ISA extensions under standardization in RISC-V. In common, all these matrix-ISA assumes extensions of the processor datapath with dedicated matrix acceleration hardware. However, executing matrix kernels requires moving large tiles between memory and processor registers, making performance limited by memory bandwidth.
We investigate whether High Bandwidth Memory with Processing-in-Memory (HBM--PIM) can serve as alternative implementation of AME instructions. We propose a PIM Execution Primitive (PEP) computational model mapping AME ISA onto Samsung Aquabolt-XL HBM-PIM microkernels, using an outer-product dataflow to enable in-memory accumulation, as well as remapping AME tile registers into memory regions—making possible to chain AME instructions without leaving the memory.
Our experiments show AME tile multiplication reaching 14.9 GFLOP/s (59.4 FLOP/cycle) on a HBM--PIM pseudo-channel, demonstrating that HBM--PIM can serve as an implementation of RISC-V matrix extensions.

Blind Submission (Default)

Poster Island B

Emanuele Venieri .ical

Sessions

Emanuele Venieri
.ical