The event’s logo

Schedule Sessions Speakers Event page
login

RunW
.ical

The speaker’s profile picture

Session

06-10
15:30
30min
End-to-End On-Device Transformer Training on Ultra-Low Power RISC-V MCU
Victor Jung, RunW

This demo showcases complete end-to-end Transformer training locally on the GAP9 RISC-V MCU. On-device training is crucial for applications that operate in dynamically changing environments. One example is biosignal DNNs in wearable devices, where cross-subject transfer and long-term temporal drift degrade performance. RISC-V MCUs are already widely used for edge DNN deployment. However, most existing work focuses either on inference only, or on fine-tuning a small portion of the network.

We extended the Deeploy compiler to generate training code. Deeploy generates bare-metal C code from an ONNX graph and is tailored for efficient inference. To support training, we added critical kernels such as optimizers and in-place gradient accumulators. We also extended the ONNX runtime training API to generate graphs optimized for edge deployment. This extension is released at https://github.com/pulp-platform/ONNX4Deeploy. To reduce the memory footprint of batching required for stable training, we implement gradient accumulation. The demo video showcases the full workflow, from training graph optimization to code generation and on-board execution. The video is available at https://drive.google.com/file/d/16BMiHn0jyMvScFJD7AGTwHpA4Rc0aMnC/view?usp=drive_link and will be uploaded to the Pulp Platform YouTube channel.

Demos
Devzone
powered by pretalx · Contact us