singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [singa-doc] nudles commented on a change in pull request #16: Add more details in the explanation of dist-train.md
Date Mon, 06 Apr 2020 03:27:42 GMT
nudles commented on a change in pull request #16: Add more details in the explanation of dist-train.md
URL: https://github.com/apache/singa-doc/pull/16#discussion_r403811276
 
 

 ##########
 File path: docs-site/docs/dist-train.md
 ##########
 @@ -145,30 +165,80 @@ if __name__ == '__main__':
     nccl_id = singa.NcclIdHolder()
 
     # Define the number of GPUs to be used in the training process
-    gpu_per_node = int(sys.argv[1])
-    gpu_num = 1
+    num_gpus = int(sys.argv[1])
 
     # Define and launch the multi-processing
 	import multiprocessing
     process = []
-    for gpu_num in range(0, gpu_per_node):
+    for gpu_num in range(0, num_gpus):
         process.append(multiprocessing.Process(target=train_mnist_cnn,
-                       args=(nccl_id, gpu_num, gpu_per_node)))
+                       args=(nccl_id, gpu_num, num_gpus)))
 
     for p in process:
         p.start()
 ```
 
+Here are some explanations concerning the variables created above:
+
+(i) `nccl_id`
+
+Note that we need to generate a NCCL ID here to be used for collective communication, and
then pass it to all the processes. 
+The NCCL ID is like a ticket, where only the processes with this ID can join the AllReduce
operation. 
+(Later if we use MPI, the passing of NCCL ID is not necessary, because the ID is broadcased
by MPI in our code automatically)
+
+(ii) `num_gpus`
 
 Review comment:
   shall we simplify the terminology by using local_rank and global_rank?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message