Qiu Jing
Qiu Jing is a CPU design engineer at Alibaba DAMO Academy, where he has been involved in the design of multiple XuanTie RISC-V processors. He served as the Acting Chair during the inception phase of the RISC-V AME (Attached Matrix Extension) Task Group, and has been actively contributing to the TG's discussions since its establishment. His work focuses on bridging the gap between AI workload requirements and RISC-V ISA design, with particular emphasis on vector/matrix extension architecture and its hardware implementation.
Session
The increasing computational demands of modern AI workloads necessitate a holistic architectural approach to AI acceleration on RISC-V processors. This talk presents the XuanTie Tensor Processing Engine (TPE), a RISC-V-based Attached Matrix Extension (AME) engine designed to address AI acceleration across three dimensions: ISA, microarchitecture, and software ecosystem.
At the ISA level, the TPE adopts the in-progress RISC-V AME specification, featuring dedicated tensor registers and a comprehensive instruction set encompassing matrix multiply-accumulate, element-wise, special function, reduction, and load/store operations with broad data type support including INT4, FP8, FP16, and micro-scaling formats.
At the microarchitecture level, the design incorporates a matrix engine achieving 2 TOPS/GHz at INT8/FP8, a concurrent vector engine with hardware-accelerated non-linear functions, and a layered memory subsystem featuring a coherent tensor cache and data prefetch engine.
A full-stack software ecosystem spanning LLVM toolchain to graph execution runtime completes the solution.
Experimental results on the XuanTie C930 cluster demonstrate 99% FP16 GEMM utilization. We discuss key design trade-offs and implications for the evolving RISC-V AME standard.