Stefano Di Matteo

Stefano Di Matteo received his M.Sc. (2019) and Ph.D. (2023) respectively in Electronic Engineering and Information Engineering from the University of Pisa. He is currently a tenure-track researcher in hardware implementation of Post-Quantum Cryptography at CEA in Grenoble. His research interests include hardware implementation of PQC with countermeasures against physical attacks, RISC-V architectures, and Instruction Set Extensions for PQC


Sessions

06-09
13:00
30min
ML-KEM on a 22 nm ASIC: Protected, Unprotected, and Hardware-Accelerated Implementations
Stefano Di Matteo, Emanuele Valea

Post-Quantum Cryptography is becoming a key building block for future secure systems, as quantum computers threaten widely deployed public-key cryptographic algorithms. In response, the NIST standardization process has selected new quantum-resistant schemes, among which ML-KEM plays a central role for key establishment. Deploying these algorithms efficiently on embedded processors is therefore a critical step toward practical adoption, particularly because embedded systems face strict constraints in terms of computational resources, memory footprint, and energy consumption. At the same time, they are more exposed to physical threats, making resistance to side-channel attacks a key requirement. These constraints make RISC-V especially attractive: its open instruction set and extensibility allow experimentation with software optimizations as well as hardware acceleration for PQC. To explore these aspects, CEA has developed VASCO3, a 22 nm ASIC chip designed to experimentally evaluate PQC implementations and side-channel countermeasures directly on silicon. The chip integrates a RISC-V–based System-on-Chip (SoC) together with several ML-KEM hardware accelerators, enabling the study of different hardware/software partitioning strategies around an embedded RISC-V CPU. In this demonstration, we present a comprehensive exploration of ML-KEM. We first showcase a pure software implementation running on the RISC-V, then progressively introduce hardware acceleration and a fully dedicated ML-KEM accelerator. We also demonstrate protected implementations based on first-order masking, including a masked software version and a masked hardware-assisted design.

Demos
Devzone
06-09
13:50
10min
Cost-Benefit Analysis of a 22nm ASIC ML-KEM Accelerator for RISC-V Secure Elements
Ivan Sarno, Stefano Di Matteo, Emanuele Valea, Hack

This paper provides a quantitative analysis of the costs and benefits of integrating a dedicated hardware accelerator for the Post Quantum Cryptography (PQC) algorithm ML-KEM into a 32-bit RISC-V SoC. We compare a software-only implementation on the CV32E40P core against a full-hardware datapath offloading the entire algorithm. We implemented the system on a 22 nm ASIC chip, and we measured the results: the dedicated hardware achieves a 139x speed-up over the software baseline. This performance gain requires an area overhead of 301 kGE, representing only a 6% increase in the total SoC silicon footprint. This study provides a data-driven assessment of the silicon-to-latency trade-off for Post-Quantum Cryptography (PQC) in resource-constrained RISC-V systems.

Blind Submission (Default)
Poster Island B
06-09
15:30
10min
Compiler-Aided Autovectorization of PQC on RISC-V Vector Extensions
Ivan Sarno, Stefano Di Matteo

Post-Quantum Cryptography (PQC) is rapidly becoming a security requirement, and ML-KEM (FIPS 203) is emerging as a foundational primitive for future secure systems. On RISC-V platforms, performance evaluations frequently emphasize custom extensions or dedicated accelerators, while the optimization potential of the standard ISA remains comparatively underexplored. This paper establishes a rigorous performance baseline for the main computational kernels of ML-KEM using only the standard RISC-V Vector Extension (RVV). Rather than relying on handwritten assembly, we apply targeted C-level program transformations that systematically enable effective compiler autovectorization, achieving up to a 10× reduction in instruction count for NTT while preserving portability across all RVV-compliant implementations.

Blind Submission (Default)
Poster Island B