BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//cfp.riscv-europe.org//eu-summit-2026//speaker//JQQEX3
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-eu-summit-2026-EZY7K8@cfp.riscv-europe.org
DTSTART;TZID=CET:20260611T135000
DTEND;TZID=CET:20260611T140000
DESCRIPTION:This work presents a co-optimized architecture for edge-based T
 ransformers\, focusing on a specialized RISC-V CPU designed to manage a pa
 rallel AI co-processor. While the BumBleBee (BBB) unit handles core Flash 
 Attention Method (FAM) computations\, the system relies on an adaptable RI
 SC-V core for critical data orchestration and pre-processing. To overcome 
 the bottlenecks of a memory-bound system\, the CPU's ISA is enhanced with 
 custom fused instructions—convcat\, lwincr\, and swincr—which consolid
 ate complex macro-operations into single-cycle actions. Notably\, the conv
 cat instruction reduces 13 F-extension instructions to one\, cutting laten
 cy by over 50%. Furthermore\, the CPU incorporates M and F extensions with
  data-gating in the ALU to minimize power consumption during scaling and n
 ormalization tasks. By prioritizing CPU-level adaptability and instruction
  fusion\, the architecture significantly reduces the energy bill and laten
 cy required for high-performance LLM inference in power-constrained enviro
 nments.
DTSTAMP:20260522T162430Z
LOCATION:Poster Island B
SUMMARY:Co-optimizing Custom Instructions RISC-V and LLM Specialized Accele
 rator for Attention-Based Edge AI - Joaquin Cornejo
URL:https://cfp.riscv-europe.org/eu-summit-2026/talk/EZY7K8/
END:VEVENT
END:VCALENDAR