tvm-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-tvm] FrozenGene edited a comment on pull request #5754: [RFC] Improve quantized convolution performance for armv8 architectures
Date Thu, 11 Jun 2020 02:59:58 GMT

FrozenGene edited a comment on pull request #5754:
URL: https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642374967


   Thanks for the great work! I have some quick question:
   1. Have you tested various models arm cpu? (like A53, A72, A55, A75 and so on). According
to fb qnnpack blog, it is not always could get best performance using `umul` / `uadalp ` compared
with `smlal` instruction (used by now). (https://engineering.fb.com/ml-applications/qnnpack/).
So just change legalization and give it up `smlal` instruction in aarch64 maybe doesn't make
sense to me. One proof:  our coming feature `Ansor` (auto scheduler) doesn't support tensorize
(at least till now), however, it could get nice performance using `smlal` instruction and
beyond TFLite 1.2X on mobilenet v2 quantized model (cortex-a53) (https://discuss.tvm.ai/t/tflite-and-tvm-comparison-for-quantized-models/6577/4).
I mean here:
   ```python
   @qnn_conv2d_legalize.register('arm_cpu')
   def _qnn_conv2d_legalize_arm_cpu(attrs, inputs, types):
       # ARM prefers the dtypes to be same.
       if is_aarch64_arm():
           return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)
       return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.qnn.op.conv2d)
   ```
   It disables us using `SMLAL` instruction.
   
   2. I suggest we keep two schedules (tensorize and default spatial pack). Not just check
`aarch64` and only use tensorize template. I mean here:
   ```python
   is_aarch64 = "aarch64" in str(isa.target)
   if is_aarch64 and data.dtype in ["int8", "uint8"]:
       strategy.add_implementation(
           wrap_compute_conv2d(topi.arm_cpu.compute_conv2d_NHWC_quantized),
           wrap_topi_schedule(topi.arm_cpu.schedule_conv2d_NHWC_quantized),
           name="compute_conv2d_NHWC_quantized.arm_cpu")
   else:
       strategy.add_implementation(
           wrap_compute_conv2d(topi.arm_cpu.conv2d_nhwc_spatial_pack),
           wrap_topi_schedule(topi.arm_cpu.schedule_conv2d_nhwc_spatial_pack),
           name="conv2d_nhwc_spatial_pack.arm_cpu")
   ```
   
   This is our design purpose of strategy. I suspect there is some workload our spatial pack
could perform better. This situation is the same as Winograd, we could perform winograd and
default template and choose better. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message