From dev-return-4995-archive-asf-public=cust-asf.ponee.io@mxnet.incubator.apache.org Wed Nov 28 15:18:48 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 7D757180658 for ; Wed, 28 Nov 2018 15:18:47 +0100 (CET) Received: (qmail 18544 invoked by uid 500); 28 Nov 2018 14:18:46 -0000 Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mxnet.incubator.apache.org Delivered-To: mailing list dev@mxnet.incubator.apache.org Received: (qmail 18495 invoked by uid 99); 28 Nov 2018 14:18:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2018 14:18:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 5671E180676 for ; Wed, 28 Nov 2018 14:18:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -3.961 X-Spam-Level: X-Spam-Status: No, score=-3.961 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_HIGH=-1.459, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=wolfram.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id IJ4K_BEyR3Op for ; Wed, 28 Nov 2018 14:18:43 +0000 (UTC) Received: from relay-int.wolfram.com (relay.wolfram.com [140.177.205.37]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 38F095FB2B for ; Wed, 28 Nov 2018 14:11:49 +0000 (UTC) Received: from [10.99.97.26] (unknown [10.99.97.26]) by relay-int.wolfram.com (Postfix) with ESMTPSA id 8588AD17A1; Wed, 28 Nov 2018 08:11:42 -0600 (CST) DKIM-Filter: OpenDKIM Filter v2.10.3 relay-int.wolfram.com 8588AD17A1 From: Taliesin Beynon Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.1 \(3445.101.1\)) Subject: trouble with foreach operator in conjunction with multiple GPUs Message-Id: <14FEEB1C-BA71-4101-9522-5F6E481FB07A@wolfram.com> Date: Wed, 28 Nov 2018 16:11:39 +0200 Cc: sebastianb To: dev@mxnet.incubator.apache.org X-Mailer: Apple Mail (2.3445.101.1) Hello fellow MXNetters We've seen that the subgraph execution mechanism that is used to run = things like the foreach operator causes MXExecutorForward to block, = instead of just issuing the ops in the normal asynchronous way = (https://github.com/apache/incubator-mxnet/blob/212364b0cba28aeda989378f6e= 630f7a61749bf3/src/executor/graph_executor.cc#L1352). On its own this is = a surprising fact that can lead to some issues if you're not expecting = it, like your time being spent in MXExecutorForward instead of WaitAll / = WaitRead . Is there a reason that this process isn't just automatically = done on a separate thread for you? Is it to ensure that subsequent ops = on the original thread are correctly serialized wrt the ops produced by = the foreach?=20 More importantly, this has the unfortunate implication that if you are = using multi-device parallelism with foreach, by just looping over your = executors and calling Forward on them, you will inadvertently serialize = much of the computation: you can't call Forward on the second executor = until Forward on the first executor has returned, and the foreach causes = that first Forward call to block until the forward pass is (mostly) = done! So it kills multi-device parallelism unless one starts making thread = pools so that the one can 'unblock' Forward (and probably the subsequent = Backward) and have each device's Forward being run in a separate thread.=20= Is this intended? Are we missing something about how you are supposed to = use subgraphs in conjunction with multi-device parallelism? It seems = like a weakness in the current design of subgraph execution. It also = appears that the python API doesn't have any strategy to deal with this = issue, as you can see on = https://github.com/apache/incubator-mxnet/blob/2276bb0e30b1fe601eb288cb4f1= b673484892d4b/python/mxnet/executor_manager.py#L281, it's not making = separate threads or anything there. Thanks! Tali + Sebastian=