singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <>
Subject [GitHub] [singa] chrishkchris commented on pull request #716: SINGA-510 Distributed Training Time Profiling
Date Sat, 06 Jun 2020 15:12:46 GMT

chrishkchris commented on pull request #716:

   > You are right. The waiting time cannot be included in the execution time of the operation.
But for some operators that use two cuda streams, we determine which stream to record events
based on the name of the operator. I think it's not an elegant scheme, any ideas about this?
   For time profiling, the idea situation is: All the buffered communicator operators should
use only one cuda stream, two streams is not good because one stream should wait for another.
So I broke down most of the operations.
   The only one kernal I did not yet break it down yet is the sparse communication kernal,
which is too long so I do not inlcude breaking the kernal down in this PR.
   My original plan of this PR is record the fp32/fp16 communication time seamlessly. If it
prodives better time profiling for the sparse communication (breaking the large kernal down),
it can be included in the future PR 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

View raw message