tvm-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-tvm] t-vi edited a comment on pull request #5600: [TOPI] Improve CUDA softmax scheduling
Date Thu, 04 Jun 2020 07:47:41 GMT

t-vi edited a comment on pull request #5600:
URL: https://github.com/apache/incubator-tvm/pull/5600#issuecomment-638622419


   I'm adding shfl intrinsics to the rocm bits (using `tvm.intrin.rule.rocm.tvm_warp_shuffle
/-up/-down` definitions).
   I'll probably run into the nvptx bits in the llvm codegen. Is there a reason not to use
the intrin.rule mechanism for nvptx?
   I'm not sure running `gpu_imagenet_bench.py` (which I'm using as the first stop of seeing
if anything works) with the nvptx target works for me (though I get to the codegen for that),
but I would not know if it worked before...
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message