tvm-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luis Vega <>
Subject [dmlc/tvm] [RFC] [VTA] [TSIM] Enabling Cycle-Accurate Hardware Simulation for VTA (#3009)
Date Thu, 11 Apr 2019 23:26:19 GMT
The following RFC proposes a new simulation environment called *TSIM* that improves software
and hardware integration and simulation accuracy compared to functional simulation. One of
the goals of this RFC is integrating the hardware development process into the software stack
from the beginning, allowing features to be incrementally implemented and evaluated as workloads
evolve over time.
Under this environment, the hardware description is the actual specification. This reduces
the burden of maintaining consistency between the specification written usually in a higher
language such as C/C++ and the actual hardware design described in a language such as Verilog.
Moving to TSIM will allow us to have a more fluid hardware-software specification, and invite
more contributions to modify different layers of the stack.

Moreover, this integration provides a more accurate performance feedback, i.e. clock cycles,
compared to the traditional functional model of a hardware accelerator.
This is because TSIM is based on an open-source hardware simulator called [Verilator](,
which compiles Verilog designs down to C++ classes for cycle-accurate simulation. 

Lastly, Verilator is already available in many Linux distributions, i.e. Ubuntu, and OSX via

## Proposed design

TSIM uses Verilator to integrate VTA designs into TVM and provides flexibility in the hardware
language used to implement these designs.
For example, one could use OpenCL, C/C++ or Chisel3 to describe a VTA design that would eventually
be compiled down to Verilog, since it is the standard input language for FPGA/ASIC tools.
Additionally, Verilator supports the Direct Programming Interface (DPI), which is part of
the Verilog standard and a mechanism to support foreign programming languages.

We leverage these features available in Verilator to interface hardware designs from upper
layers in the TVM stack such as drivers, runtime, etc. In fact, we have developed all the
glue layers to make this happen, including:

* **DPI module.** Based on the DSO module located at `tvm/src/runtime/`, the
`` is in charge of loading the shared library `` that contains the
hardware accelerator and the Verilator execution function.
As stated earlier, Verilator is used to compile the hardware accelerator from Verilog to C++.
Additionally, the DPI module provides an API that can be used by drivers to manage the accelerator
by writing/reading registers and terminate (exit) the simulation.

* **Verilator execution function.** This function is called `` and it is used by Verilator
to instantiate the accelerator, generate clock and reset signals, and dump simulation waveforms
when it is enabled. The `` also contains function pointers to DPI functions which are
implemented in the DPI module ``. This adds greater flexibility because the behavior
of DPI functions can be modified by upper layers in the stack.

* **Hardware DPI modules.** Normally, a hardware accelerator interface can be simplified in
two main components, one for control and another for data. The control interface is driven
by a host CPU, whereas the data interface is connected to either external memories (DRAM)
or internal memories in the form of scratchpads or caches.
There are two hardware modules written in Verilog implementing these two interfaces called
`VTAHostDPI.v` and `VTAMemDPI.v`.
Accelerators implemented in Verilog can use these modules directly but we also provide Chisel3
wrappers `BlackBox` for accelerators described in this language.

* **Add-by-one accelerator example.** To showcase the interaction between all of these components,
we implemented an Add-by-one accelerator, in both Chisel3 and Verilog, together with a software
driver called ``.
Also, we provide cmake scripts for building everything automatically and a `config.json` file
for managing accelerator and simulation options.

Finally, the following snippet shows how a VTA design simulation, based on the add-by-one
example, is invoked on TVM:

ctx = tvm.cpu(0)
a = tvm.nd.array(...) # input
b = tvm.nd.array(...) # output
tsim = tvm.module.load("", "vta-tsim")
f = tvm.get_global_func("tvm.vta.driver")
f(tsim, a, b)

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message