mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kellen Sunderland <>
Subject Re: [apache/incubator-mxnet] [RFC] GPU performance improvements in MXNet engine (#18951)
Date Tue, 18 Aug 2020 00:28:16 GMT
I really like this proposal, thanks for the great write-up Przemyslaw.

I haven't totally thought through pros/cons, but would it be possible to return a cudaStreamWaitEvent
by default after every block of operators is called, and use that as a reference for any dependent
block of ops? Would this unblock our GPU worker threads because we're not calling a cudaStreamSync?

If I'm understanding correctly that would be the equivalent of what you're proposing in your
second scenario (when we have two cuda streams)? Would it have a lot of overhead in scenario
1 where we use same stream?

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message