Catalin Ciobanu
Catalin Ciobanu has a MSc and PhD from Delft University of Technology, The Netherlands. He is currently Associate Professor at Transilvania University of Brasov, Romania and Senior Researcher at the National Institute for Research and Development in Microtechnologies - IMT Bucharest, Romania. His research interests include RISC-V processors, embedded systems, high performance computing, SIMD architectures, digital signal processing and reconfigurable hardware.
Sessions
The development of our tightly coupled SIMD/Vector accelerator for matrix operations requires extending the RISC-V instruction set. Special compiler support is required for this extension. Our methodology starts from a Sail description of the ISA extension and generates the compiler target description data. The instructions are described in Sail and are tested in the generated simulator. The compiler is generated from the description model and is tested with the accelerator implemented in hardware. The experimental results suggest that for matrix multiplication we obtained speed-ups up to 1413x compared to an ARM A72 core.
The development of our tightly coupled SIMD/Vector accelerator for matrix operations requires extending the RISC-V instruction set. Special compiler support is required for this extension. Our methodology starts from a Sail description of the ISA extension and generates the compiler target description data.
The accelerator main features are: 32 software defined 2D registers, dedicated hardware for matrix operations and a dedicated memory interface. The accelerator employs the CoreV-eXtension-Interface (CV-X-IF) and could be connected to multiple RISC–V cores that feature this interface.
The custom instructions extend the RISC–V ISA and follow their encoding. The custom instructions are of three types: to define matrix registers, matrix operations and memory operations.
The instructions are described in Sail and are tested in the generated simulator. adl_tool transforms the Sail architecture description into compiler model artifacts needed to build a functional prototype compiler for the given specification. Additionally, provides automatically generated tests to validate the correctness of the instruction encodings.
The compiler was generated from the description model and tested with the accelerator implemented in hardware. The experimental results suggest that for matrix multiplication we obtained speed-ups up to 1413x compared to an ARM A72 core.