2026-06-09 –, Poster Island C
Sparse matrix–dense matrix multiplication (SpMM) is a fundamental workload in high-performance computing and emerging edge workloads, yet its performance is typically memory-bound due to irregular and indirect memory accesses. While the RISC-V Vector Extension (RVV) provides flexible data-parallel execution, efficiently exploiting it for sparse workloads remains challenging.
This work evaluates an iterative SpMM kernel on an RVV-enabled RISC-V processor (Spacemit X60, 8 cores) and investigates the combined impact of locality-aware data layout and explicit vectorization. We compare scalar, compiler-vectorized, library-based, and manual intrinsic implementations. Additionally, we apply Morton (Z-order) reordering to improve spatial locality in memory.
Experimental results show that vectorization alone provides limited benefits in memory-bound regimes. However, when combined with Morton reordering, manual RVV vectorization achieves the best performance. Microarchitectural analysis confirms reduced cache misses and improved IPC, although the workload remains fundamentally bandwidth-limited.
The study highlights the importance of data layout co-design when targeting sparse workloads on emerging RISC-V platforms.