Maximizing Performance at Low Area Cost in RISC-V Processors Leveraging Fine-Grained Multithreading
2026-06-10 , Poster Island D

Embedded applications increasingly rely on highly efficient, low-area RISC-V processors. However, short 3-stage pipelines often suffer from data and control hazards that degrade performance by introducing frequent stalls. This paper presents the implementation of Fine-Grained Multithreading (FGMT) on [OMITTED FOR BLIND REVIEW], an industrial 32-bit RISC-V core supporting the RV32ECM instruction set. By interleaving two hardware threads, the design effectively hides pipeline stalls and simplifies branch target calculation without requiring complex branch prediction. To further mitigate structural hazards and the underutilization caused by inactive thread contexts in fixed round-robin scheduling, we introduce a novel "Thread Forwarding" (TF) technique which enables a form of TLP (Thread-Level Parallelism). Implemented in 40nm (C40) technology with a target frequency of 300MHz, the standard FGMT achieves a 14\% IPC improvement over the baseline at the cost of a 9\% area overhead within an MCU SoC featuring 16 KBytes of instruction and data memory. The TF architecture achieves a Pareto optimal configuration, further boosting IPC to 0.958 (+18.8\% over baseline) while maintaining the same area footprint as the standard FGMT implementation.