mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taliesin Beynon <>
Subject trouble with foreach operator in conjunction with multiple GPUs
Date Wed, 28 Nov 2018 14:11:39 GMT
Hello fellow MXNetters

We've seen that the subgraph execution mechanism that is used to run things like the foreach
operator causes MXExecutorForward to block, instead of just issuing the ops in the normal
asynchronous way (
On its own this is a surprising fact that can lead to some issues if you're not expecting
it, like your time being spent in MXExecutorForward instead of WaitAll / WaitRead . Is there
a reason that this process isn't just automatically done on a separate thread for you? Is
it to ensure that subsequent ops on the original thread are correctly serialized wrt the ops
produced by the foreach? 

More importantly, this has the unfortunate implication that if you are using multi-device
parallelism with foreach, by just looping over your executors and calling Forward on them,
you will inadvertently serialize much of the computation: you can't call Forward on the second
executor until Forward on the first executor has returned, and the foreach causes that first
Forward call to block until the forward pass is (mostly) done!

So it kills multi-device parallelism unless one starts making thread pools so that the one
can 'unblock' Forward (and probably the subsequent Backward) and have each device's Forward
being run in a separate thread. 

Is this intended? Are we missing something about how you are supposed to use subgraphs in
conjunction with multi-device parallelism? It seems like a weakness in the current design
of subgraph execution. It also appears that the python API doesn't have any strategy to deal
with this issue, as you can see on,
it's not making separate threads or anything there.

Tali + Sebastian
View raw message