singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [singa-doc] nudles commented on issue #14: rearrange contents in dist-train.md
Date Sat, 04 Apr 2020 07:30:18 GMT
nudles commented on issue #14: rearrange contents in dist-train.md
URL: https://github.com/apache/singa-doc/pull/14#issuecomment-608989769
 
 
   The [DIST](https://github.com/apache/singa/blob/master/examples/autograd/mnist_cnn.py#L153)
variable can be inferred based on the num of gpus?
   for MPI, you do not need to pass `num_gpus` explicitly to `DistOpt`? but for multiprocessing,
you need?
   
   The format of the docString is very good!
   Some arguments may need more explanations:
   1. [nccl_id] (https://github.com/apache/singa/blob/master/python/singa/opt.py#L191) is
compulsory for multiprocessing? and should be none for MPI?
   2. how about num_gpu and gpu_per_node?
   3. give a concrete example for `rank_in_local` and `rank_in_global`
   
   In addition, we may need to introduce the implementation of distributed training code in
SINGA at the end of this documentation. We have given the overview of  the synchronous training
algorithm at the beginning in this documentation. But how what is done at the Python side
and C++ side is unknown. When NCCL and MPI APIs are called. This part is mainly for developers
(not for end users). 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message