Ivan Sarno
Sessions
This paper provides a quantitative analysis of the costs and benefits of integrating a dedicated hardware accelerator for the Post Quantum Cryptography (PQC) algorithm ML-KEM into a 32-bit RISC-V SoC. We compare a software-only implementation on the CV32E40P core against a full-hardware datapath offloading the entire algorithm. We implemented the system on a 22 nm ASIC chip, and we measured the results: the dedicated hardware achieves a 139x speed-up over the software baseline. This performance gain requires an area overhead of 301 kGE, representing only a 6% increase in the total SoC silicon footprint. This study provides a data-driven assessment of the silicon-to-latency trade-off for Post-Quantum Cryptography (PQC) in resource-constrained RISC-V systems.
Post-Quantum Cryptography (PQC) is rapidly becoming a security requirement, and ML-KEM (FIPS 203) is emerging as a foundational primitive for future secure systems. On RISC-V platforms, performance evaluations frequently emphasize custom extensions or dedicated accelerators, while the optimization potential of the standard ISA remains comparatively underexplored. This paper establishes a rigorous performance baseline for the main computational kernels of ML-KEM using only the standard RISC-V Vector Extension (RVV). Rather than relying on handwritten assembly, we apply targeted C-level program transformations that systematically enable effective compiler autovectorization, achieving up to a 10× reduction in instruction count for NTT while preserving portability across all RVV-compliant implementations.