All The Scaling, No New State: One Matrix ISA with Microarchitectural Freedom
2026-06-11 , Plenary

RISC-V's Zvvm matrix extension stores all tile state in the standard V register file and derives tile geometry algebraically from VLEN, SEW, and a new aspect-ratio field λ. This yields arithmetic intensity that scales with VLEN: a binary compiled at VLEN=256 delivers higher throughput at VLEN=65536 with no recompilation. The same partial-VL mechanism that enables one-column-at-a-time embedded streaming also drives full HPC bulk tiling, while microscaling is integrated via vm-bit opcode aliasing with no new architectural state.

Tile dimensions are not programmer-specified constants — they are consequences of existing parameters. The tile is always square: M = N = VLEN/(SEW×λ), with inner dimension K_eff = λ×W×LMUL. Arithmetic intensity (M/2) grows proportionally with VLEN, and the ratio of intensity to cache-to-VRF bandwidth remains constant — a provable algebraic identity with no equivalent in Arm SME or Intel AMX.

Zvvm's geometry knobs form an intent vocabulary expressed from both sides: software selects LMUL and VL to control K_eff depth and streaming granularity; hardware determines λ and VLEN to shape the tile for its datapath. Setting VL = K_eff with LMUL = 1 gives portable streaming; increasing LMUL or computing multiple C panels trades register pressure for compute intensity — all via the same opcode.

Microscaling (MX) support is integrated by aliasing the vm bit in FP multiply-accumulate opcodes, introducing no new encoding space, registers, or modes.


This extended abstract presents Zvvm, a RISC-V matrix ISA extension that takes a fundamentally different approach from Arm SME and Intel AMX. Where SME requires a dedicated ZA register file and streaming mode, and AMX uses fixed 16×16 tiles that cannot exploit wider datapaths without ISA revision, Zvvm stores all matrix state in the standard V register file, derives tile geometry algebraically from existing CSR fields, and introduces no new architectural state or modes.

This presentation offers a rare window into the architects' design rationale behind a matrix ISA approach that has no counterpart in the industry. Rather than presenting a finished specification, we expose the algebraic foundations, the interplay of five independent geometry knobs (VLEN, λ, LMUL, VL, W), and the deliberate trade-offs that allow one ISA — and one binary — to span from a VLEN=128 microcontroller streaming one column at a time to a VLEN=65536 supercomputer computing full tiles.

We show how partial-VL streaming and full bulk tiling are two ends of the same VL continuum, how the bidirectional intent vocabulary lets software and hardware independently express their capabilities through the same opcode, and how microscaling (MXFP8, MXFP4, MXINT8) is integrated via vm-bit opcode aliasing with zero overhead. The design is validated by a public QEMU implementation, BLAS kernels, and a parameterized test suite covering all (SEW, λ, LMUL) combinations — enabling concurrent hardware and software development against a stable, machine-testable specification.

Dr. Philipp Tomsich is Chief Technologist and Founder of VRULL GmbH, providing strategic R&D for semiconductor companies. He chairs the RISC-V Applications & Tools Committee, serves on the RISC-V Board of Directors, and is Vice-Chair of the Technical Steering Committee, where he champions software ecosystem growth and standards alignment, including efforts to publish RISC-V under ISO.

He instigated the standards-development matrix operations and AI/ML, serving as principal editor of the Integrated Matrix Extension and as the Vice-chair of the Attached Matrix TG.