mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carl Yang <>
Subject Re: Single-Machine Topology-aware Communication
Date Mon, 25 Jun 2018 17:46:23 GMT
I added a few more figures showing how I got the
MXNET_KVSTORE_GPUARRAY_BOUND value [Figures 7(b) and 7(c)]. I
performed a microbenchmark measuring runtime in seconds vs. message
size sent using MXNet's KVStore. Figure 7(b) shows the results of a
crossover point around 1M. Beyond this point, multi-tree seems to show
higher bandwidth, and before this point, single tree higher bandwidth.

However, the 150 push-pulls before waiting microbenchmark [Figure
7(c)] shows the crossover point around 10M if we extrapolate its
behaviour to the right. These could not be plotted due to the memory
consumption being too high since I am using 150 push-pulls of fairly
large size as a proxy for neural network parameters. This combined
with doing a parameter sweep over MXNET_KVSTORE_GPUARRAY_BOUND shown
in Figure 7(a) on VGG suggests that 10M is preferable to 1M.

I currently generate 8 trees whose roots are located at each GPU for
the multiple root case. I use only the first tree when doing the
single tree Reduce and Broadcast. This showed better performance
compared to using different roots in single tree case.


On 6/25/18, Pedro Larroy <> wrote:
> Nice design document. From where does it come the default value
> Do you generate a tree for each GPU?
> Pedro.
> On Mon, Jun 18, 2018 at 2:30 PM Carl Yang <> wrote:
>> Hi,
>> Currently, we have two methods for single-machine communication:
>> parameter server and NCCL ring reduction. Both of these methods have
>> some downsides. Parameter server does not differentiate between NVLink
>> connections and PCI-E, so it ends up using the higher latency and
>> slower PCI-E connections as frequently as it does NVLink. NCCL uses
>> the ring reduce algorithm, which has higher theoretical latency than
>> other algorithms. I am working on a topology-aware approach that can
>> address these limitations. Design proposal is on cwiki:
>> Please feel free to let me know if you have any suggestions.
>> Regards,
>> Carl

View raw message