Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B4DA2200CD7 for ; Mon, 17 Jul 2017 21:11:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B37D3165926; Mon, 17 Jul 2017 19:11:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 105BA165922 for ; Mon, 17 Jul 2017 21:11:04 +0200 (CEST) Received: (qmail 74293 invoked by uid 500); 17 Jul 2017 19:11:04 -0000 Mailing-List: contact issues-help@systemml.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@systemml.apache.org Delivered-To: mailing list issues@systemml.apache.org Received: (qmail 74224 invoked by uid 99); 17 Jul 2017 19:11:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Jul 2017 19:11:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A1FBBC03A7 for ; Mon, 17 Jul 2017 19:11:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id ljMt47_P6e6u for ; Mon, 17 Jul 2017 19:11:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 44D1E5FBC6 for ; Mon, 17 Jul 2017 19:11:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 93E57E0BCB for ; Mon, 17 Jul 2017 19:11:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5262E2475E for ; Mon, 17 Jul 2017 19:11:01 +0000 (UTC) Date: Mon, 17 Jul 2017 19:11:01 +0000 (UTC) From: "Fei Hu (JIRA)" To: issues@systemml.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (SYSTEMML-1774) Improve Parfor parallelism for deep learning MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 17 Jul 2017 19:11:05 -0000 [ https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090338#comment-16090338 ] Fei Hu edited comment on SYSTEMML-1774 at 7/17/17 7:10 PM: ----------------------------------------------------------- cc [~mboehm7] and [~dusenberrymw] Could you help check if my understanding about this issue is right? was (Author: tenma): cc [~mboehm7] Could you help check if my understanding about this issue is right? > Improve Parfor parallelism for deep learning > -------------------------------------------- > > Key: SYSTEMML-1774 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1774 > Project: SystemML > Issue Type: Improvement > Components: Algorithms, Compiler, ParFor > Affects Versions: SystemML 1.0 > Reporter: Fei Hu > Labels: deeplearning > > When running the [distributed MNIST LeNet example | https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml], each mini-batch could ideally run in parallel without interaction. We try to force {{parfor (j in 1:parallel_batches)}} at line 137 of {{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use {{REMOTE_SPARK}} mode, but got some errors about {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions of type other than CP instructions}}. More log information can be found at the following comments. One example of the errors is that at the convolutional layer, we need to randomly generate some matrixes, but SystemML choose {{RandSPInstruction}} instead of {{DataGenCPInstruction}}, which may be because SystemML could not determine the row number of the matrix. For this distributed MNIST LeNet example, using CPInstruction may achieve better performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)