mxnet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet Distributed Training by AllReduce
Date Mon, 02 Jul 2018 07:54:27 GMT
threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet Distributed Training by AllReduce
URL: https://github.com/apache/incubator-mxnet/pull/10696#issuecomment-401702058
 
 
   @eric-haibin-lin 
   1. dist allreduce only support mpirun just like hovorod. I have documented this in the
design doc. Do I need to add it elsewhere? 
   2. It's not easy to use cluster=mpi through launcher. Because the version of mpirun and
the mpi library used in mxnet should be strictly match (e.g. mpich's mpirun cannot work with
intel mpi library's barrier) Unlike parameter server, it can use many version of mpirun because
it just used its functionality of fork process  in multi-machine.
   3. I will modify the code according to your review comment.
   4. I will add it to the jenkins file in tests/nightly with macro's help.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message