Stefano Di Matteo
Stefano Di Matteo received his M.Sc. (2019) and Ph.D. (2023) respectively in Electronic Engineering and Information Engineering from the University of Pisa. He is currently a tenure-track researcher in hardware implementation of Post-Quantum Cryptography at CEA in Grenoble. His research interests include hardware implementation of PQC with countermeasures against physical attacks, RISC-V architectures, and Instruction Set Extensions for PQC
Sessions
Post-Quantum Cryptography is becoming a key building block for future secure systems, as quantum computers threaten widely deployed public-key cryptographic algorithms. In response, the NIST standardization process has selected new quantum-resistant schemes, among which ML-KEM plays a central role for key establishment. Deploying these algorithms efficiently on embedded processors is therefore a critical step toward practical adoption, particularly because embedded systems face strict constraints in terms of computational resources, memory footprint, and energy consumption. At the same time, they are more exposed to physical threats, making resistance to side-channel attacks a key requirement. These constraints make RISC-V especially attractive: its open instruction set and extensibility allow experimentation with software optimizations as well as hardware acceleration for PQC. To explore these aspects, CEA has developed VASCO3, a 22 nm ASIC chip designed to experimentally evaluate PQC implementations and side-channel countermeasures directly on silicon. The chip integrates a RISC-V–based System-on-Chip (SoC) together with several ML-KEM hardware accelerators, enabling the study of different hardware/software partitioning strategies around an embedded RISC-V CPU. In this demonstration, we present a comprehensive exploration of ML-KEM. We first showcase a pure software implementation running on the RISC-V, then progressively introduce hardware acceleration and a fully dedicated ML-KEM accelerator. We also demonstrate protected implementations based on first-order masking, including a masked software version and a masked hardware-assisted design.
This paper provides a quantitative analysis of the costs and benefits of integrating a dedicated hardware accelerator for the Post Quantum Cryptography (PQC) algorithm ML-KEM into a 32-bit RISC-V SoC. We compare a software-only implementation on the CV32E40P core against a full-hardware datapath offloading the entire algorithm. We implemented the system on a 22 nm ASIC chip, and we measured the results: the dedicated hardware achieves a 139x speed-up over the software baseline. This performance gain requires an area overhead of 301 kGE, representing only a 6% increase in the total SoC silicon footprint. This study provides a data-driven assessment of the silicon-to-latency trade-off for Post-Quantum Cryptography (PQC) in resource-constrained RISC-V systems.
Post-Quantum Cryptography (PQC) is rapidly becoming a security requirement, and ML-KEM (FIPS 203) is emerging as a foundational primitive for future secure systems. On RISC-V platforms, performance evaluations frequently emphasize custom extensions or dedicated accelerators, while the optimization potential of the standard ISA remains comparatively underexplored. This paper establishes a rigorous performance baseline for the main computational kernels of ML-KEM using only the standard RISC-V Vector Extension (RVV). Rather than relying on handwritten assembly, we apply targeted C-level program transformations that systematically enable effective compiler autovectorization, achieving up to a 10× reduction in instruction count for NTT while preserving portability across all RVV-compliant implementations.