PRANOSE J EDAVOOR
Sessions
The growing demand for artificial intelligence, scientific computing, and large-scale data analytics has significantly increased the need for massively parallel computing architectures. Modern GPUs provide high computational throughput by executing thousands of concurrent threads, but most existing GPU architectures remain proprietary, limiting open architectural innovation and research. This paper presents Vishwa, a scalable RISC-V based General Purpose GPU (GPGPU) architecture designed to enable open and extensible parallel computing platforms. The architecture adopts a hierarchical compute model composed of Vishwa Compute Clusters (VCLs) containing multiple Vishwa Compute Cores (VCCs) that execute threads using a Single Instruction Multiple Thread (SIMT) execution model. Each compute core integrates specialised Vishwa Matrix Cores (VMCs) designed to accelerate matrix-intensive operations commonly used in machine learning workloads. Work distribution across the architecture is managed by a global Vishwa Work Distributor (VWD) that schedules workloads across available compute clusters. The architecture is supported by a complete software ecosystem through the CHAKRA compiler stack, which integrates with LLVM to provide kernel compilation and runtime execution support. The compute core architecture has been implemented and validated on an FPGA platform, demonstrating functional correctness of the execution pipeline and SIMT execution model.
The emergence of RISC-V as an open and extensible instruction set architecture has enabled the development of domain-specific accelerators and General-Purpose Graphics Processing Units (GPGPUs). While the RISC-V ISA provides support for scalar instructions and the RISC-V Vector Extension (RVV) enables data-parallel vector execution, these models do not directly support the Single-Instruction Multiple-Thread (SIMT) execution paradigm required by modern GPU architectures. Consequently, efficient software enablement for RISC-V–based GPUs requires compiler support capable of generating SIMT-oriented instruction sequences and managing massively parallel execution. This proposal talks about CHAKRA-GP, a hardware-optimized compiler framework for RISC-V–based GPGPU architectures. Built upon LLVM and MLIR infrastructures, CHAKRA-GP provides a scalable compilation pipeline enabling efficient kernel generation, memory optimization, and parallel execution mapping for massively parallel workloads. The compiler targets custom RISC-V GPGPU platforms and enables efficient execution of HPC, scientific computing, and AI workloads. The work demonstrates how an extensible compiler infrastructure can bridge the gap between the RISC-V ISA and SIMT-based GPU execution models, enabling efficient compilation for customizable RISC-V GPGPU architectures.