From dev-return-4825-archive-asf-public=cust-asf.ponee.io@singa.apache.org  Sat Apr  4 14:15:01 2020
Return-Path: <dev-return-4825-archive-asf-public=cust-asf.ponee.io@singa.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 8F74518065C
	for <archive-asf-public@cust-asf.ponee.io>; Sat,  4 Apr 2020 16:15:01 +0200 (CEST)
Received: (qmail 79145 invoked by uid 500); 4 Apr 2020 14:15:01 -0000
Mailing-List: contact dev-help@singa.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@singa.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@singa.apache.org>
List-Post: <mailto:dev@singa.apache.org>
List-Id: <dev.singa.apache.org>
Reply-To: dev@singa.apache.org
Delivered-To: mailing list dev@singa.apache.org
Received: (qmail 79132 invoked by uid 99); 4 Apr 2020 14:15:00 -0000
Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70)
    by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Apr 2020 14:15:00 +0000
From: GitBox <git@apache.org>
To: dev@singa.apache.org
Subject: [GitHub] [singa-doc] chrishkchris edited a comment on issue #14: rearrange
 contents in dist-train.md
Message-ID: <158600970082.30132.17098409022725169853.gitbox@gitbox.apache.org>
References: <infra.14.MDExOlB1bGxSZXF1ZXN0Mzk4NDY0NTAz.gitbox@gitbox.apache.org>
In-Reply-To: <infra.14.MDExOlB1bGxSZXF1ZXN0Mzk4NDY0NTAz.gitbox@gitbox.apache.org>
Date: Sat, 04 Apr 2020 14:15:00 -0000
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

chrishkchris edited a comment on issue #14: rearrange contents in dist-train.md
URL: https://github.com/apache/singa-doc/pull/14#issuecomment-609034750
 
 
   > Then would it be better to design the APIs in this way:
   > 
   > 1. For training with MPI
   > 
   > ```python
   >  # in mnist_mpi.py
   > if __name__  == '__main__':
   >   sgd = ...
   >   sgd = DistOpt(sgd)
   >   train_mnist(sgd, sparse, topK)
   > ```
   > 
   > 1. For training via multiprocessing
   > 
   > ```python
   >  # in mnist_multiprocessing.py
   > if __name__  == '__main__':
   >   nccl_id = ...
   >   sgd = ...
   >   sgd = DistOpt(sgd, num_gpu, nccl_id)
   >   train_mnist(sgd, sparse, topK)
   > ```
   > 
   > Even if you use socket, multiprocessing can only run on a single node, hence num_gpu = gpu_per_node.
   
   Sorry, I updated my comment:
   
   num_gpu is the local rank of a specific process
   gpu_per_node is the total number of ranks in a single node
   
   see https://github.com/apache/singa/blob/master/examples/autograd/mnist_multiprocess.py#L42
   ```
   for gpu_num in range(0, gpu_per_node):        
           process.append(multiprocessing.Process(target=train_mnist_cnn, args=(sgd, max_epoch, 
          batch_size, True, data_partition, gpu_num, gpu_per_node, nccl_id)))
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services