tvm-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <>
Subject [GitHub] [incubator-tvm] FrozenGene commented on pull request #5754: [RFC] Improve quantized convolution performance for armv8 architectures
Date Thu, 11 Jun 2020 11:15:51 GMT

FrozenGene commented on pull request #5754:

   > 1. It will be hard to do this. The point is that the legalization is done in Relay
before picking the strategy (thus, it is unaware of the strategy picked). To keep both legalizations
I need somehow to pass information from the strategy (e.g., the name of the algorithm, or
something like that). Are you aware of any other ways I can do it?
   @giuseros I think add the algorithm name could be one way to handle it. For example, we
could add it in the `attr` and query it in the legalization pass, then we could throw it safely.
   > Note that I am targeting NHWC layout. I wasn't able to even compile with conv2d_nhwc_spatial_pack
for uint8 (it just hangs, at least when I tried it without auto-tuning on Armv8-A). I gathered
from various discussions that NHWC support for arm targets is incomplete at the moment. So
for now we might simply agree to leave this as default for NHWC and conv2d_nchw_spatial_pack
as default for NCHW and mirror that in the legalization step which might look like:
       ```if is_aarch64_arm() and attrs.data_layout == "NHWC":
           return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d)
       return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.qnn.op.conv2d)```
   Yes, our NHWC schedule on arm cpu doesn't be complete. After our careful testing, NHWC
is also perform better than NCHW on arm cpu using Ansor (aka auto scheduler) too. So this
prompts us we could improve our AutoTVM NHWC schedule on arm cpu too.  As the result I show
in the post,  we use auto schedule is to leverage NHWC layout and `smlal` instruction,  I
prefer we could leverage `attr[algorithm_name]` mentioned previous to keep `smlal` instruction.
After auto scheduler released (we are working hard to do it, we wish after 2 weeks we could
bring it in), we could see how to improve it (like generating smlal and smlal2 or your tensorize
instruction), they are orthogonal but they share the same legalize pass.
   One background of auto scheduler: In auto scheduler, we only need tvm.compute, then we
could generate schedule automatically, so we could try NHWC / NCHW easily. So there is no
spatial pack schedule template concept in the auto scheduler world in fact.
   > About the Raspberry+mobilenet v2, good to know you are working on Armv8-A (sorry to
have assumed otherwise). However, there is still the point that mobilenet uses shallow convolutions,
while I am addressing deeper and more generic convolutions.
   So we should keep both algorithm better, right?
   > Are you saying that, as things stand now in TVM, the conv2d_nhwc_spatial_pack schedule
might be faster than the gemm approach on smaller CPUs? Unfortunately, for now I don't think
they can be added together because of what I said above about the legalization step. Do you
know any work-around to that? Maybe I can legalize only for specific devices (e.g., only for
   I think add algorithm name mentioned before maybe could help to solve it.
   > Finally, as things stand now we might get this PR in, and later do a more detailed
comparison across different networks + CPUs
   Ok. I buy it in. After legalization pass we discussed is solved, I am glad to do code review
carefully and handle this pr.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

View raw message