mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Haibin Lin <>
Subject Re: [apache/incubator-mxnet] [RFC] Unified API for Distributed Data Parallel Training (#16795)
Date Tue, 12 Nov 2019 20:00:45 GMT
I did mean use case 2,3,4. 
Initialization is done in the constructor `kv.__init__()`, and for horovod it could be simply
a `hvd.init()` call. 

I have not discussed problem 1 for too much details. horovod uses mpirun to setup connection
and launch processes, while byteps/p3 and native kvstore currently use the `dmlc/launcher`
script. I do see that `dmlc/launcher` has mpi support, but I need to play more with it to
see if it fits existing use cases. But I don't see fundamental blockers for (1). 

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message