Priority-Aware Scheduling of Multi-Model, Multi-Precision DNN Inference on Multi-Cores RISC-V
2026-06-09 , Poster Island C

Efficient deployment of Deep Learning (DL) models on RISC-V-based multi-core platforms remains a significant challenge, especially when multiple models with heterogeneous structures and precision requirements must run concurrently. Existing frameworks offer optimized execution for single-model inference but lack support for multi-model scheduling, as well as priority-based resource allocation.
In this work, we extend the capabilities of such frameworks by formalizing the problem of multi-model, multiprecision inference scheduling on constrained many-core architectures like Parallel Ultra-Low Power (PULP). We define a scheduling space where multiple Deep Neural Networks (DNNs), varying in size, type and precision, compete for limited computing and memory resources. We introduce a simple, priority-aware scheduling layer that allocates cores and memory tiles across models, aiming to either minimize overall inference latency or find a tradeoff satisfying each model’s deadline.
To demonstrate the effectiveness of our approach, we leverage the existing Deployment Oriented to memoRY (DORY) framework, and apply a greedy scheduling strategy. We conducted experiments with several models across several tasks and showed that even basic scheduling policies can significantly improve latency, core utilization, and memory efficiency over static and sequential baselines.