Return-Path: X-Original-To: apmail-systemml-dev-archive@minotaur.apache.org Delivered-To: apmail-systemml-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 52C45199F1 for ; Fri, 15 Apr 2016 03:15:04 +0000 (UTC) Received: (qmail 54462 invoked by uid 500); 15 Apr 2016 03:15:04 -0000 Delivered-To: apmail-systemml-dev-archive@systemml.apache.org Received: (qmail 54425 invoked by uid 500); 15 Apr 2016 03:15:04 -0000 Mailing-List: contact dev-help@systemml.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@systemml.incubator.apache.org Delivered-To: mailing list dev@systemml.incubator.apache.org Received: (qmail 54414 invoked by uid 99); 15 Apr 2016 03:15:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Apr 2016 03:15:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E8AAAC0D6B for ; Fri, 15 Apr 2016 03:15:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.015 X-Spam-Level: X-Spam-Status: No, score=-4.015 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, MSGID_FROM_MTA_HEADER=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.996, SPF_PASS=-0.001, TVD_FW_GRAPHIC_NAME_MID=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 6dRe1--uM3un for ; Fri, 15 Apr 2016 03:15:00 +0000 (UTC) Received: from e38.co.us.ibm.com (e38.co.us.ibm.com [32.97.110.159]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id DA5335F242 for ; Fri, 15 Apr 2016 03:14:59 +0000 (UTC) Received: from localhost by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 14 Apr 2016 21:14:58 -0600 Received: from d03dlp02.boulder.ibm.com (9.17.202.178) by e38.co.us.ibm.com (192.168.1.138) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 14 Apr 2016 21:14:56 -0600 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: mboehm@us.ibm.com X-IBM-RcptTo: dev@systemml.incubator.apache.org Received: from b03cxnp08026.gho.boulder.ibm.com (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id E8F5B3E40030 for ; Thu, 14 Apr 2016 21:14:55 -0600 (MDT) Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u3F3EteA44564514 for ; Thu, 14 Apr 2016 20:14:55 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u3F3Etdh027315 for ; Thu, 14 Apr 2016 21:14:55 -0600 Received: from d50lp32.co.us.ibm.com (d50lp32.boulder.ibm.com [9.17.249.36]) by d03av04.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id u3F3EtKu027308 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Thu, 14 Apr 2016 21:14:55 -0600 Message-Id: <201604150314.u3F3EtKu027308@d03av04.boulder.ibm.com> Received: from localhost by d50lp32.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 14 Apr 2016 21:14:55 -0600 Received: from smtp.notes.na.collabserv.com (192.155.248.74) by d50lp32.co.us.ibm.com (192.168.2.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256/256) Thu, 14 Apr 2016 21:14:52 -0600 X-IBM-Helo: smtp.notes.na.collabserv.com X-IBM-MailFrom: mboehm@us.ibm.com X-IBM-RcptTo: dev@systemml.incubator.apache.org Received: from /spool/local by smtp.notes.na.collabserv.com with smtp.notes.na.collabserv.com ESMTP for from ; Fri, 15 Apr 2016 03:14:51 -0000 Received: from us1a3-smtp04.a3.dal06.isc4sb.com (10.106.154.237) by smtp.notes.na.collabserv.com (10.106.227.92) with smtp.notes.na.collabserv.com ESMTP; Fri, 15 Apr 2016 03:14:49 -0000 Received: from us1a3-mail149.a3.dal06.isc4sb.com ([10.146.38.84]) by us1a3-smtp04.a3.dal06.isc4sb.com with ESMTP id 2016041503144904-29981 ; Fri, 15 Apr 2016 03:14:49 +0000 MIME-Version: 1.0 In-Reply-To: <201604150253.u3F2rUCS004743@d03av04.boulder.ibm.com> Subject: Re: parfor fails To: dev@systemml.incubator.apache.org Cc: "Ethan Xu" From: "Matthias Boehm" Date: Thu, 14 Apr 2016 20:14:46 -0700 References: <201604150253.u3F2rUCS004743@d03av04.boulder.ibm.com> X-KeepSent: AD36F09C:9D2C75C1-00257F96:0011AFAE; type=4; name=$KeepSent X-Mailer: IBM Notes Release 9.0.1FP2 SHF37 August 25, 2014 X-LLNOutbound: False X-Disclaimed: 59755 X-TNEFEvaluated: 1 Content-type: multipart/related; Boundary="0__=8FBBF505DF82293E8f9e8a93df938690918c8FBBF505DF82293E" x-cbid: 16041503-0029-0000-0000-0000230412FB X-IBM-ISS-SpamDetectors: Score=0.394815; BY=0; FL=0; FP=0; FZ=0; HX=0; KW=0; PH=0; SC=0.394815; ST=0; TS=0; UL=0; ISC= X-IBM-ISS-DetailInfo: BY=3.00005158; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000157; SDB=6.00688220; UDB=6.00316825; UTC=2016-04-15 03:14:50 x-cbparentid: 16041503-5920-0000-0000-000007304465 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER --0__=8FBBF505DF82293E8f9e8a93df938690918c8FBBF505DF82293E Content-type: multipart/alternative; Boundary="1__=8FBBF505DF82293E8f9e8a93df938690918c8FBBF505DF82293E" --1__=8FBBF505DF82293E8f9e8a93df938690918c8FBBF505DF82293E Content-Transfer-Encoding: quoted-printable Content-type: text/plain; charset=US-ASCII just for completeness, this issue is tracked with https://issues.apache.org/jira/browse/SYSTEMML-635 and the fix will be available tomorrow. Regards, Matthias From: Matthias Boehm/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Cc: "Ethan Xu" Date: 04/14/2016 07:53 PM Subject: Re: parfor fails Hi Ethan, thanks for catching this issue. The parfor script itself is perfectly fine but you encountered an interesting runtime bug. Usually, you can find the actual cause at the bottom of the stacktrace or in previous exceptions. I was able to reproduce this issue if NO systemml config file is provided (fails on parsing this non-existing config in the parfor mr job task setup). So the workaround is to put a SystemML-config.xml into the same directory. Interestingly, the issue did not show up in our testsuite because we always specify a default configuration there (which was until recently mandatory). As a side note, we strongly recommend parfor over for loops here because it runs the entire loop in 1 instead of 2396 MR jobs due to automatic data partitioning. However, for the specific example at hand, a data-parallel formulation (with "s =3D colSums(x=3D=3D0)") would be even better as it all= ows for partial aggregation and hence reduces shuffle. Regards, Matthias Ethan Xu ---04/14/2016 01:34:24 PM---Hello, I have a quick question. The following script fails with this error: From: Ethan Xu To: dev@systemml.incubator.apache.org Date: 04/14/2016 01:34 PM Subject: parfor fails Hello, I have a quick question. The following script fails with this error: org.apache.sysml.runtime.DMLRuntimeException: PARFOR: Failed to execute loop in parallel. Here is the dml script: x=3Dread($X); print("number of rows of x =3D " + nrow(x)); print("number of cols of x =3D " + ncol(x)); parfor(i in 1:ncol(x), check=3D0){ a =3D x[,i]; print("number of 0's in col " + i + " =3D " + sum(a =3D=3D 0)); } where X is a 35 million by 2396 matrix (coded and dummy coded numerical matrix) on HDFS. The script runs fine with regular 'for' loops. Could someone explain why this script cannot run in parallel? Was it a wrong way to code parfor? Thanks, Ethan --1__=8FBBF505DF82293E8f9e8a93df938690918c8FBBF505DF82293E Content-Transfer-Encoding: quoted-printable Content-type: text/html; charset=US-ASCII Content-Disposition: inline

just for completeness, this issue is tracked with https://issues.apache.= org/jira/browse/SYSTEMML-635 and the fix will be available tomorrow.
Regards,
Matthias

3D"In=Matthias Boehm---04/14/2016 07:53:43 PM---Hi Ethan, thanks for catching th= is issue. The parfor script itself is perfectly fine

From: Matthias Bo= ehm/Almaden/IBM@IBMUS
To: = dev@systemml.incubator.apache.org
Cc: "Eth= an Xu" <ethan.yifanxu@gmail.com>
Date: 04/14/2016 07:53 PM

Subject: Re: parfor fails





Hi Ethan= ,

thanks for catching this issue. The parfor script itself is perfec= tly fine but you encountered an interesting runtime bug. Usually, you can f= ind the actual cause at the bottom of the stacktrace or in previous excepti= ons. I was able to reproduce this issue if NO systemml config file is provi= ded (fails on parsing this non-existing config in the parfor mr job task se= tup). So the workaround is to put a SystemML-config.xml into the same direc= tory. Interestingly, the issue did not show up in our testsuite because we = always specify a default configuration there (which was until recently mand= atory).

As a side note, we strongly recommend parfor over for loops = here because it runs the entire loop in 1 instead of 2396 MR jobs due to au= tomatic data partitioning. However, for the specific example at hand, a dat= a-parallel formulation (with "
s =3D colSums(x=3D=3D0)") would be eve= n better as it allows for partial aggregation and hence reduces shuffle.
Regards,
Matthias

= Ethan Xu ---04/14/2016 01:34:24 PM---Hello, I have a quick question. The fo= llowing script fails with this error:

From:
Ethan Xu <ethan.yifanxu@gmail.com&= gt;
To:
dev@systemml.incubator.apache.org=
Date:
04/14/2016 01:34 PM
Subject:
parfor fails




Hello,

I have a quick question. The fol= lowing script fails with this error:

org.apache.sysml.runtime.DMLRun= timeException: PARFOR: Failed to execute
loop in parallel.

Here i= s the dml script:

x=3Dread($X);

print("number of rows of= x =3D " + nrow(x));
print("number of cols of x =3D " + n= col(x));

parfor(i in 1:ncol(x), check=3D0){
  a =3D x[,i];<= br>   print("number of 0's in col " + i + " =3D " = + sum(a =3D=3D 0));
}

where X is a 35 million by 2396 matrix (cod= ed and dummy coded numerical
matrix) on HDFS. The script runs fine with = regular 'for' loops.

Could someone explain why this script cannot ru= n in parallel? Was it a
wrong way to code parfor?

Thanks,

= Ethan




--1__=8FBBF505DF82293E8f9e8a93df938690918c8FBBF505DF82293E-- --0__=8FBBF505DF82293E8f9e8a93df938690918c8FBBF505DF82293E--