2026-06-11 –, Poster Island A
Cryptographic hash algorithms for zero-knowledge proof systems often rely on prime-field S-box kernels such as x⁷ mod p over 31-bit fields. We accelerate this class of S-box primitives on a 4×4 coarse-grained reconfigurable array (CGRA) integrated within a RISC-V SoC. As a case study, we use the BabyBear instantiation adopted by the state-of-the-art Poseidon2 hash function, employing Barrett reduction to avoid software division on the host core. Our mapping decomposes operands into 8-bit limbs across CGRA processing elements and exploits the toroidal mesh for carry propagation in 4 hops. Compared to a hand-optimized baseline, we achieve 1.26× speedup and 25.7% energy reduction; versus an automatic compiler, we improve by 6.6× speedup and save 82% energy. Cycle-accurate RTL simulation of a full Poseidon2 integration shows ~3.3× fewer cycles than the RISC-V host for the full 141-invocation workload at 100 MHz (even a ~1.3× reduction at 250 MHz).