Joaquin Cornejo
Currently pursuing a PhD in Microelectronics with a focus on digital circuits for edge AI. I work on digital hardware for edge implementations of Transformer models (aka LLMs), with a particular emphasis on attention mechanisms—especially dot-product attention—and on the design of ASICs that implement this computation with optimized power and performance. A goal-oriented and curious engineer, committed to continuous learning and collaboration.
Session
This work presents a co-optimized architecture for edge-based Transformers, focusing on a specialized RISC-V CPU designed to manage a parallel AI co-processor. While the BumBleBee (BBB) unit handles core Flash Attention Method (FAM) computations, the system relies on an adaptable RISC-V core for critical data orchestration and pre-processing. To overcome the bottlenecks of a memory-bound system, the CPU's ISA is enhanced with custom fused instructions—convcat, lwincr, and swincr—which consolidate complex macro-operations into single-cycle actions. Notably, the convcat instruction reduces 13 F-extension instructions to one, cutting latency by over 50%. Furthermore, the CPU incorporates M and F extensions with data-gating in the ALU to minimize power consumption during scaling and normalization tasks. By prioritizing CPU-level adaptability and instruction fusion, the architecture significantly reduces the energy bill and latency required for high-performance LLM inference in power-constrained environments.