tvm-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Haichen Shen <notificati...@github.com>
Subject [dmlc/tvm] [RFC] Register Relay VM design (#2915)
Date Wed, 27 Mar 2019 17:52:12 GMT
# Register VM Design
Current Relay VM RFC (#2810) proposes stack-based VM design which uses push and pop to maintain
a unified stack. Though the design itself is simple to implement, it is cumbersome in tasks
such as dataflow analysis and enforces certain orders in the execution.

I propose a register-based VM design as an alternative design. The main difference is that
register-based VM uses registers to designate the operands and results instead of using the
stack. Registers in the VM are virtual registers and each is a reference to a `VMObject`.
We assume there are infinite registers so runtime doesn't need to worry about register spilling.
Registers are in SSA form where its index is unique in each VMFunction. 

Calling convention in register VM is also simple. Each function has its own local register
file. The first `k` registers in the register file of callee function are arguments passed
in by caller function, and the VMObject in return register is allocated by callee function
instead of caller function to avoid unnecessary memory copy. Caller function then assigns
the reference to VMObject pointed by return register into its own register (see detail in
`Invoke` instruction).

I summarize some pros and cons for register VM compared to stack VM.

Pros:
- Data dependency analysis is self-explaining as register reveals the data dependency. In
comparison, stack VM requires you to simulate the program and recover the stack index in order
to get data dependency. As a result, it'll be easier to implement a dataflow executor or out-of-order
executor.
- If/else scope is easier to handle. Previously stack VM needs to move the objects to the
right place in the stack given it enters if or else branch. Now register VM only needs a new
`Phi` instruction to select the result from either branch.
- Tail recursion optimization should be easier since we only need to move the register to
the correct slot in the register file.

Cons:
- There is some memory overhead to keep the empty slots for registers that are never used.
- Need to kill the register after its life cycle. This can be done by either annotating the
life cycle of each register or inserting the kill instructions in the program during the life
cycle analysis. Note that life cycle analysis pass is needed by both stack VM and register
VM.

## Instructions
We modify the current stack VM instructions to use registers as operands and results.
Registers are designated by `$k` where `k` is the register index. `*reg` indicates a list
of registers.
### AllocTensor
```
AllocTenosr $1, ndim, *shape, dtype  ; %1 = AllocTensor(ndim, *shape, dtype)
```
Allocate a tensor object and stores to register `$1`.
### AllocDatatype
```
AllocDatatype $1, tag, num_fields, *regs  ; %1 = AllocDatatype(tag, num_fields, *regs)
```
Allocate a datatype object and stores to register `$1`. `*regs` is a list of registers containing
fields in the datatype.
### AllocClosure
```
AllocClosure $1, func_index, num_freevars, *regs  ; %1 = AllocClosure(func_index, num_freevars,
*regs)
```
Allocate a closure object, where there are `num_freevars` in `*regs`.
### LoadConst
```
LoadConst $1, const_index  ; %1 = LoadConst(const_index)
```
Load the constant at `const_index` from the constant pool.

### Mov
```
Mov $2, $1  ; %2 = Mov(%1)
```
Create a reference to VMObject in register `$1` and stores it in `$2`.

Note: No data copy happens in this instruction. The real data copy should be a PackedFunction.
### Phi
```
Phi $3, $1, $2  ; %3 = Phi(%1, %2)
```
Takes the `VMObject` either in `$1` or `$2` and stores in `$3`.

Note: This instruction requires VMObject in register `$1` and `$2` having the same type, and
only one of them should be valid during the runtime.
### Ret
```
Ret $1
```
Returns the register `$1`.
### GetField
```
GetField $2, $1, field_index  ; %2 = GetField(%1, field_index)
```
Get the field at `field_index` in `$1`.
### If
```
If $1, true_offset, false_offset
```
Check if `$1` is true or false. If true relative jump by true_branch, else relative jump by
false_branch.
### Goto
```
Goto pc_offset
```
Relative unconditional jump by `pc_offset`.
### InvokePacked
```
InvokePacked packed_index, arity, output_size, *regs
```
Invoke the packed function denoted by `packed_index`

Note: Number of registers in `*regs` should be `arity + output_size` where first `arity` registers
are arguments and rest are output registers.
### Invoke
```
Invoke $1, func_index, arity, *regs  ; %1 = Invoke(func_index, arity, *regs)
```
Invoke VM function at `func_index`

Note: Number of registers in `*regs` should be `arity`. Register `$1` will be a reference
to the VMObject in the return register.
### InvokeClosure
```
InvokeClosure $2, $1, arity, *regs  ; %2 = InvokeClosure(%1, arity, *regs)
```
Invoke closure function in register `$2` and save the result into register `$2`.

Note: Register `$2` must be a `VMClosureObject`.

## Stack and State
In order to convert to register-based VM, we also need to adjust the data structure in the
VM. We assume there are infinite registers available, and each function has its own register
file. Each time an `Invoke` instruction is executed, the runtime creates a new `VMFrame`.
We pre-allocate max number of registers used in the callee function (this number can be derived
during `VMCompile`) and assigns the arguments to the first `args` slots in the registers.
```
struct VMFrame {
  // Current function
  size_t func_index;
  // Pre-allocate max number of registers used in the functions
  std::vector<VMObject> registers;
  // Number of arguments
  size_t args;
  // Pointer into the current function's instructions.
  const Instruction* code;
  // Current program counter
  size_t pc;
};
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2915
Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message