Dongjie Xie


Sessions

06-09
14:10
10min
From Profiling to Performance: Optimizing Small Language Models on RISC‑V Architectures
Dongjie Xie, Rama Malladi, Jose Arnau, Chip Kerchner

Small Language Models (SLMs) are increasingly critical for edge AI, yet their performance on RISC-V requires rigorous profiling to identify architectural bottlenecks. This work evaluates the performance of SLMs including Gemma3, Llama-3.2, Qwen-2.5, DeepSeek, and Phi-3.5 on the Tenstorrent Ascalon RISC-V Core. We developed a profiling methodology to analyze workload distribution, which revealed that Matrix Multiplication (MatMul) contributes ~90% of total compute across all evaluated models. Given the computational complexity of running full-model emulations, we extract these critical kernels for targeted benchmarking. Our implementation on the HAPS platform achieves significant performance leaps over standard baselines. FP32 execution, utilized for maximum precision, was optimized by transitioning from traditional SGEMM to a new high-performance implementation. Simultaneously, INT8 performance, targeted for efficient inference, was accelerated by migrating from standard RVV to a specialized IGEMM (with a VQDOT) implementation.

Non-Blind submission
Poster Island C
06-10
10:30
10min
A Doom Demo Journey: Tenstorrent's Ascalon CPU on Synopsys emulation and prototyping systems
Dongjie Xie, Brandon Zupan, Rae Parnmukh

This paper tells the incremental journey of taking Tenstorrent’s Ascalon RISC‑V CPU IP from RTL and emulation to a playable DOOM demo on a Synopsys’s prototyping platform. Along the way we describe the problems we overcame, and how we optimized our flows and the design. We close with a set of lessons and recommendations for teams who want to use emulation and prototyping and realistic workloads like DOOM to de‑risk RISC‑V IP adoption and accelerate hardware/software co‑design.

Non-Blind submission
Poster Island C