APEX: Accelerating FFT on CVA6 with a Tightly Coupled CV-X-IF Co-processor
2026-06-11 , Poster Island A

The Fast Fourier Transform (FFT) is a fundamental algorithm in embedded and edge signal processing applications, including audio and speech processing, radar systems, and biomedical sensing, where real-time performance must be achieved under strict area and power constraints. Conventional approaches typically rely on dedicated standalone accelerators, but these often impose significant area and power overheads that are impractical for resource-constrained embedded and edge platforms. To address this, tightly-coupled acceleration within the CPU pipeline offers a more efficient alternative by delivering substantial performance gains without requiring an independent hardware block. This paper presents APEX, a tightly-coupled coprocessor integrated with the CV32A6 32-bit RISC-V processor, designed to provide high-performance FFT acceleration for embedded RISC-V systems. For a fixed-point FFT of size N=512, APEX achieves an 83.5% reduction in execution cycles and an 87.9% reduction in instruction count compared to the software FFT implementation on the baseline CV32A6, while preserving the baseline operating frequency and full RV32IM_Zicsr software compatibility with only minimal area overhead. These results demonstrate that APEX is an efficient and practical solution for accelerating FFT-intensive workloads in embedded and edge deployments built on open RISC-V architectures.


APEX is a hardware/software co-design project targeting FFT acceleration on the CVA6 RISC-V application-class processor. On the hardware side, APEX is a tightly coupled co-processor connected to CVA6 via an enhanced CV-X-IF interface, implementing pipelined radix-2 and radix-4 butterfly units with a dedicated APEX Register File (APR) for wide operand handling in Q1.15 fixed-point arithmetic. On the software side, the KissFFT library serves as the application program, modified to exploit the APEX hardware through custom RISC-V instructions encoded in the reserved custom opcode space. These instructions — covering butterfly computation (bfly2, bfly4) and APR load/store configuration (APEX_CFG, APEX_RESTORE) — are integrated into the LLVM compiler toolchain, enabling the generation of APEX-aware machine code directly from a high-level C FFT application. The full stack spans RTL design of the co-processor, CV-X-IF integration with CVA6, ISA extension and instruction encoding, LLVM backend modifications for custom instruction emission, and application-level profiling on FPGA.

Digital Design Engineer with 3+ years of industry experience in RTL design and verification, specialising in RISC-V CPU architecture, ISA extensions, and hardware accelerators for FPGA and ASIC targets. Passionate about open-source CPU design, computer architecture, and hardware/software co-design. Active contributor to the RISC-V ecosystem, including CVA6, SERV, and RISC-V Architecture Compatibility Test Suites (ACTs).