tvm-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-tvm] cbalint13 opened a new pull request #5800: [ONNX] Skip multiply with 1.0f constant for GEMM import
Date Sat, 13 Jun 2020 17:31:58 GMT

cbalint13 opened a new pull request #5800:
URL: https://github.com/apache/incubator-tvm/pull/5800


   This small PR optimize the GEMM (nn.dense) import via ONNX. It also makes quantization
decisions much better.
   
   **Description**
   
   A single ```Gemm``` operator from ONNX expands into a series of ```transpose```, ```multiply```,
```dense```, ```bias_add``` layers in accord with formula: ```Y = alpha * A * B + beta * C```
   
   **Outcome**
   1. This PR eliminates one ```multiply()``` layer in case of ```alpha == 1```
   2. The ommited layer leads much better decisions on final quantization ```realization```
step.
   
   
   **Intermediate Results**
   
   * Before
   ```
     %15 = nn.batch_flatten(%14) /* ty=Tensor[(1, 800), float32] */;
     %16 = multiply(%15, 1f /* ty=float32 */) /* ty=Tensor[(1, 800), float32] */;
     %17 = relay.op.annotation.simulated_quantize(%16, 0.0625f /* ty=float32 */, -127f /*
ty=float32 */, 127f /* ty=float32 */, kind=1) /* ty=Tensor[(1, 800), float32] */;
     %18 = nn.dense(%17, meta[relay.Constant][2] /* ty=Tensor[(512, 800), float32] */ /* ty=Tensor[(512,
800), float32] */, units=512) /* ty=Tensor[(1, 512), float32] */;
   ```
   * After
   ```
     %15 = nn.batch_flatten(%14) /* ty=Tensor[(1, 800), float32] */;
     %16 = relay.op.annotation.simulated_quantize(%15, 0.0625f /* ty=float32 */, -127f /*
ty=float32 */, 127f /* ty=float32 */, kind=1) /* ty=Tensor[(1, 800), float32] */;
     %17 = nn.dense(%16, meta[relay.Constant][2] /* ty=Tensor[(512, 800), float32] */ /* ty=Tensor[(512,
800), float32] */, units=512) /* ty=Tensor[(1, 512), float32] */;
   ```
   **Quantized Results**
   
   * Before
   ```
     %35 = nn.batch_flatten(%34) /* ty=Tensor[(1, 512), int32] */;
     %36 = cast(%35, dtype="float32") /* ty=Tensor[(1, 512), float32] */;
     %37 = multiply(%36, 6.10352e-05f /* ty=float32 */) /* ty=Tensor[(1, 512), float32] */;
     %38 = multiply(%37, 1f /* ty=float32 */) /* ty=Tensor[(1, 512), float32] */;
     %39 = multiply(%38, 16f /* ty=float32 */) /* ty=Tensor[(1, 512), float32] */;
     %40 = round(%39) /* ty=Tensor[(1, 512), float32] */;
     %41 = clip(%40, a_min=-127f, a_max=127f) /* ty=Tensor[(1, 512), float32] */;
   ```
   * After
   ```
     %29 = nn.batch_flatten(%28) /* ty=Tensor[(1, 512), int32] */;
     %30 = add(%29, 512 /* ty=int32 */) /* ty=Tensor[(1, 512), int32] */;
     %31 = right_shift(%30, 10 /* ty=int32 */) /* ty=Tensor[(1, 512), int32] */;
     %32 = clip(%31, a_min=-127f, a_max=127f) /* ty=Tensor[(1, 512), int32] */;
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message