Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 72A0B200CB5 for ; Wed, 12 Jul 2017 21:11:20 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 711E516A581; Wed, 12 Jul 2017 19:11:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ECCC516A57B for ; Wed, 12 Jul 2017 21:11:18 +0200 (CEST) Received: (qmail 29346 invoked by uid 500); 12 Jul 2017 19:11:18 -0000 Mailing-List: contact issues-help@systemml.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@systemml.apache.org Delivered-To: mailing list issues@systemml.apache.org Received: (qmail 29337 invoked by uid 99); 12 Jul 2017 19:11:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jul 2017 19:11:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 7F0F41812FA for ; Wed, 12 Jul 2017 19:11:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id mlBkOzDDXNr7 for ; Wed, 12 Jul 2017 19:11:11 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 26A2B60DFD for ; Wed, 12 Jul 2017 19:11:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 66792E0059 for ; Wed, 12 Jul 2017 19:11:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 0E009246EA for ; Wed, 12 Jul 2017 19:11:00 +0000 (UTC) Date: Wed, 12 Jul 2017 19:11:00 +0000 (UTC) From: "Fei Hu (JIRA)" To: issues@systemml.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (SYSTEMML-1762) Improve the robustness of sparse matrix reshape function for the Spark mode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 12 Jul 2017 19:11:20 -0000 [ https://issues.apache.org/jira/browse/SYSTEMML-1762?page=3Dcom.atlass= ian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1= 6084521#comment-16084521 ]=20 Fei Hu edited comment on SYSTEMML-1762 at 7/12/17 7:10 PM: ----------------------------------------------------------- The error messages are as following: {code:java} 17/07/12 12:04:47 ERROR TaskSetManager: Task 1 in stage 177.0 failed 1 time= s; aborting job 17/07/12 12:04:47 INFO TaskSetManager: Lost task 3.0 in stage 177.0 (TID 52= 8) on localhost, executor driver: java.lang.NullPointerException (null) [du= plicate 1] 17/07/12 12:04:47 INFO TaskSchedulerImpl: Cancelling stage 177 17/07/12 12:04:47 INFO TaskSchedulerImpl: Stage 177 was cancelled 17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 2.0 in sta= ge 177.0 (TID 527) 17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 0.0 in sta= ge 177.0 (TID 525) 17/07/12 12:04:47 INFO DAGScheduler: ShuffleMapStage 177 (flatMapToPair at = MatrixReshapeSPInstruction.java:106) failed in 0.016 s due to Job aborted d= ue to stage failure: Task 1 in stage 177.0 failed 1 times, most recent fail= ure: Lost task 1.0 in stage 177.0 (TID 526, localhost, executor driver): ja= va.lang.NullPointerException =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(Lib= MatrixReorg.java:1591) =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrix= Reorg.java:504) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) =09at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) =09at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSor= ter.scala:191) =09at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWrit= er.scala:63) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:96) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:53) =09at org.apache.spark.scheduler.Task.run(Task.scala:99) =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1142) =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:617) =09at java.lang.Thread.run(Thread.java:748) Driver stacktrace: 17/07/12 12:04:47 INFO DAGScheduler: Job 139 failed: fold at RDDAggregateUt= ils.java:137, took 0.018972 s 17/07/12 12:04:47 INFO Executor: Executor killed task 0.0 in stage 177.0 (T= ID 525) 17/07/12 12:04:47 ERROR ParWorker: Failed to execute task (type=3DSET, iter= ations=3D{[j=3D3]}), retry:0 org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in progr= am block generated from statement block between lines 0 and 0 -- Error eval= uating instruction: SPARK=C2=B0uark+=C2=B0_mVar3618=C2=B7MATRIX=C2=B7DOUBLE= =C2=B0_mVar3619=C2=B7MATRIX=C2=B7DOUBLE=C2=B0SINGLE_BLOCK =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleIns= truction(ProgramBlock.java:316) =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructi= ons(ProgramBlock.java:217) =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramB= lock.java:163) =09at org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeSetTa= sk(ParWorker.java:167) =09at org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeTask(= ParWorker.java:136) =09at org.apache.sysml.runtime.controlprogram.parfor.LocalParWorker.run(Loc= alParWorker.java:122) =09at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: Job aborted due to stage failur= e: Task 1 in stage 177.0 failed 1 times, most recent failure: Lost task 1.0= in stage 177.0 (TID 526, localhost, executor driver): java.lang.NullPointe= rException =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(Lib= MatrixReorg.java:1591) =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrix= Reorg.java:504) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) =09at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) =09at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSor= ter.scala:191) =09at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWrit= er.scala:63) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:96) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:53) =09at org.apache.spark.scheduler.Task.run(Task.scala:99) =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1142) =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:617) =09at java.lang.Thread.run(Thread.java:748) Driver stacktrace: =09at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DA= GScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D= AGScheduler.scala:1423) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D= AGScheduler.scala:1422) =09at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.= scala:59) =09at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) =09at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala= :1422) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$= 1.apply(DAGScheduler.scala:802) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$= 1.apply(DAGScheduler.scala:802) =09at scala.Option.foreach(Option.scala:257) =09at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGSchedu= ler.scala:802) =09at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(D= AGScheduler.scala:1650) =09at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAG= Scheduler.scala:1605) =09at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAG= Scheduler.scala:1594) =09at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) =09at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628= ) =09at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925) =09at org.apache.spark.SparkContext.runJob(SparkContext.scala:1988) =09at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1089) =09at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.s= cala:151) =09at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.s= cala:112) =09at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) =09at org.apache.spark.rdd.RDD.fold(RDD.scala:1083) =09at org.apache.spark.api.java.JavaRDDLike$class.fold(JavaRDDLike.scala:41= 4) =09at org.apache.spark.api.java.AbstractJavaRDDLike.fold(JavaRDDLike.scala:= 45) =09at org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils.a= ggStable(RDDAggregateUtils.java:137) =09at org.apache.sysml.runtime.instructions.spark.AggregateUnarySPInstructi= on.processInstruction(AggregateUnarySPInstruction.java:102) =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleIns= truction(ProgramBlock.java:286) =09... 6 more Caused by: java.lang.NullPointerException =09at {color:#f79232}org.apache.sysml.runtime.matrix.data.LibMatrixReorg.re= shapeSparse(LibMatrixReorg.java:1591){color} =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrix= Reorg.java:504) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) =09at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) =09at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSor= ter.scala:191) =09at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWrit= er.scala:63) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:96) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:53) =09at org.apache.spark.scheduler.Task.run(Task.scala:99) =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1142) =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:617) =09... 1 more 17/07/12 12:04:47 INFO Executor: Executor killed task 2.0 in stage 177.0 (T= ID 527) 17/07/12 12:04:47 ERROR ParWorker: Error executing task:=20 org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in progr= am block generated from statement block between lines 0 and 0 -- Error eval= uating instruction: SPARK=C2=B0uark+=C2=B0_mVar3618=C2=B7MATRIX=C2=B7DOUBLE= =C2=B0_mVar3619=C2=B7MATRIX=C2=B7DOUBLE=C2=B0SINGLE_BLOCK =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleIns= truction(ProgramBlock.java:316) =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructi= ons(ProgramBlock.java:217) =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramB= lock.java:163) =09at org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeSetTa= sk(ParWorker.java:167) =09at org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeTask(= ParWorker.java:136) =09at org.apache.sysml.runtime.controlprogram.parfor.LocalParWorker.run(Loc= alParWorker.java:122) =09at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: Job aborted due to stage failur= e: Task 1 in stage 177.0 failed 1 times, most recent failure: Lost task 1.0= in stage 177.0 (TID 526, localhost, executor driver): java.lang.NullPointe= rException =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(Lib= MatrixReorg.java:1591) =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrix= Reorg.java:504) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) =09at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) =09at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSor= ter.scala:191) =09at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWrit= er.scala:63) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:96) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:53) =09at org.apache.spark.scheduler.Task.run(Task.scala:99) =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1142) =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:617) =09at java.lang.Thread.run(Thread.java:748) Driver stacktrace: =09at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DA= GScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D= AGScheduler.scala:1423) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D= AGScheduler.scala:1422) =09at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.= scala:59) =09at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) =09at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala= :1422) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$= 1.apply(DAGScheduler.scala:802) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$= 1.apply(DAGScheduler.scala:802) =09at scala.Option.foreach(Option.scala:257) =09at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGSchedu= ler.scala:802) =09at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(D= AGScheduler.scala:1650) =09at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAG= Scheduler.scala:1605) =09at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAG= Scheduler.scala:1594) =09at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) =09at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628= ) =09at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925) =09at org.apache.spark.SparkContext.runJob(SparkContext.scala:1988) =09at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1089) =09at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.s= cala:151) =09at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.s= cala:112) =09at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) =09at org.apache.spark.rdd.RDD.fold(RDD.scala:1083) =09at org.apache.spark.api.java.JavaRDDLike$class.fold(JavaRDDLike.scala:41= 4) =09at org.apache.spark.api.java.AbstractJavaRDDLike.fold(JavaRDDLike.scala:= 45) =09at org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils.a= ggStable(RDDAggregateUtils.java:137) =09at org.apache.sysml.runtime.instructions.spark.AggregateUnarySPInstructi= on.processInstruction(AggregateUnarySPInstruction.java:102) =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleIns= truction(ProgramBlock.java:286) =09... 6 more Caused by: java.lang.NullPointerException =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(Lib= MatrixReorg.java:1591) =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrix= Reorg.java:504) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) =09at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) =09at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSor= ter.scala:191) =09at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWrit= er.scala:63) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:96) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:53) =09at org.apache.spark.scheduler.Task.run(Task.scala:99) =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1142) =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:617) =09... 1 more 17/07/12 12:04:47 ERROR ParWorker: Stopping LocalParWorker. {code} was (Author: tenma): The error messages are as following: {code:java} 17/07/12 12:04:47 ERROR TaskSetManager: Task 1 in stage 177.0 failed 1 time= s; aborting job 17/07/12 12:04:47 INFO TaskSetManager: Lost task 3.0 in stage 177.0 (TID 52= 8) on localhost, executor driver: java.lang.NullPointerException (null) [du= plicate 1] 17/07/12 12:04:47 INFO TaskSchedulerImpl: Cancelling stage 177 17/07/12 12:04:47 INFO TaskSchedulerImpl: Stage 177 was cancelled 17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 2.0 in sta= ge 177.0 (TID 527) 17/07/12 12:04:47 INFO Executor: Executor is trying to kill task 0.0 in sta= ge 177.0 (TID 525) 17/07/12 12:04:47 INFO DAGScheduler: ShuffleMapStage 177 (flatMapToPair at = MatrixReshapeSPInstruction.java:106) failed in 0.016 s due to Job aborted d= ue to stage failure: Task 1 in stage 177.0 failed 1 times, most recent fail= ure: Lost task 1.0 in stage 177.0 (TID 526, localhost, executor driver): ja= va.lang.NullPointerException =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(Lib= MatrixReorg.java:1591) =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrix= Reorg.java:504) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) =09at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) =09at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSor= ter.scala:191) =09at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWrit= er.scala:63) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:96) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:53) =09at org.apache.spark.scheduler.Task.run(Task.scala:99) =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1142) =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:617) =09at java.lang.Thread.run(Thread.java:748) Driver stacktrace: 17/07/12 12:04:47 INFO DAGScheduler: Job 139 failed: fold at RDDAggregateUt= ils.java:137, took 0.018972 s 17/07/12 12:04:47 INFO Executor: Executor killed task 0.0 in stage 177.0 (T= ID 525) 17/07/12 12:04:47 ERROR ParWorker: Failed to execute task (type=3DSET, iter= ations=3D{[j=3D3]}), retry:0 org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in progr= am block generated from statement block between lines 0 and 0 -- Error eval= uating instruction: SPARK=C2=B0uark+=C2=B0_mVar3618=C2=B7MATRIX=C2=B7DOUBLE= =C2=B0_mVar3619=C2=B7MATRIX=C2=B7DOUBLE=C2=B0SINGLE_BLOCK =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleIns= truction(ProgramBlock.java:316) =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructi= ons(ProgramBlock.java:217) =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramB= lock.java:163) =09at org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeSetTa= sk(ParWorker.java:167) =09at org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeTask(= ParWorker.java:136) =09at org.apache.sysml.runtime.controlprogram.parfor.LocalParWorker.run(Loc= alParWorker.java:122) =09at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: Job aborted due to stage failur= e: Task 1 in stage 177.0 failed 1 times, most recent failure: Lost task 1.0= in stage 177.0 (TID 526, localhost, executor driver): java.lang.NullPointe= rException =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(Lib= MatrixReorg.java:1591) =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrix= Reorg.java:504) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) =09at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) =09at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSor= ter.scala:191) =09at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWrit= er.scala:63) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:96) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:53) =09at org.apache.spark.scheduler.Task.run(Task.scala:99) =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1142) =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:617) =09at java.lang.Thread.run(Thread.java:748) Driver stacktrace: =09at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DA= GScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D= AGScheduler.scala:1423) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D= AGScheduler.scala:1422) =09at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.= scala:59) =09at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) =09at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala= :1422) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$= 1.apply(DAGScheduler.scala:802) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$= 1.apply(DAGScheduler.scala:802) =09at scala.Option.foreach(Option.scala:257) =09at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGSchedu= ler.scala:802) =09at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(D= AGScheduler.scala:1650) =09at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAG= Scheduler.scala:1605) =09at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAG= Scheduler.scala:1594) =09at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) =09at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628= ) =09at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925) =09at org.apache.spark.SparkContext.runJob(SparkContext.scala:1988) =09at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1089) =09at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.s= cala:151) =09at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.s= cala:112) =09at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) =09at org.apache.spark.rdd.RDD.fold(RDD.scala:1083) =09at org.apache.spark.api.java.JavaRDDLike$class.fold(JavaRDDLike.scala:41= 4) =09at org.apache.spark.api.java.AbstractJavaRDDLike.fold(JavaRDDLike.scala:= 45) =09at org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils.a= ggStable(RDDAggregateUtils.java:137) =09at org.apache.sysml.runtime.instructions.spark.AggregateUnarySPInstructi= on.processInstruction(AggregateUnarySPInstruction.java:102) =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleIns= truction(ProgramBlock.java:286) =09... 6 more Caused by: java.lang.NullPointerException =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(Lib= MatrixReorg.java:1591) =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrix= Reorg.java:504) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) =09at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) =09at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSor= ter.scala:191) =09at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWrit= er.scala:63) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:96) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:53) =09at org.apache.spark.scheduler.Task.run(Task.scala:99) =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1142) =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:617) =09... 1 more 17/07/12 12:04:47 INFO Executor: Executor killed task 2.0 in stage 177.0 (T= ID 527) 17/07/12 12:04:47 ERROR ParWorker: Error executing task:=20 org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in progr= am block generated from statement block between lines 0 and 0 -- Error eval= uating instruction: SPARK=C2=B0uark+=C2=B0_mVar3618=C2=B7MATRIX=C2=B7DOUBLE= =C2=B0_mVar3619=C2=B7MATRIX=C2=B7DOUBLE=C2=B0SINGLE_BLOCK =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleIns= truction(ProgramBlock.java:316) =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructi= ons(ProgramBlock.java:217) =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramB= lock.java:163) =09at org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeSetTa= sk(ParWorker.java:167) =09at org.apache.sysml.runtime.controlprogram.parfor.ParWorker.executeTask(= ParWorker.java:136) =09at org.apache.sysml.runtime.controlprogram.parfor.LocalParWorker.run(Loc= alParWorker.java:122) =09at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: Job aborted due to stage failur= e: Task 1 in stage 177.0 failed 1 times, most recent failure: Lost task 1.0= in stage 177.0 (TID 526, localhost, executor driver): java.lang.NullPointe= rException =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(Lib= MatrixReorg.java:1591) =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrix= Reorg.java:504) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) =09at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) =09at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSor= ter.scala:191) =09at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWrit= er.scala:63) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:96) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:53) =09at org.apache.spark.scheduler.Task.run(Task.scala:99) =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1142) =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:617) =09at java.lang.Thread.run(Thread.java:748) Driver stacktrace: =09at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DA= GScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D= AGScheduler.scala:1423) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D= AGScheduler.scala:1422) =09at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.= scala:59) =09at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) =09at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala= :1422) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$= 1.apply(DAGScheduler.scala:802) =09at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$= 1.apply(DAGScheduler.scala:802) =09at scala.Option.foreach(Option.scala:257) =09at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGSchedu= ler.scala:802) =09at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(D= AGScheduler.scala:1650) =09at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAG= Scheduler.scala:1605) =09at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAG= Scheduler.scala:1594) =09at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) =09at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628= ) =09at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925) =09at org.apache.spark.SparkContext.runJob(SparkContext.scala:1988) =09at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1089) =09at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.s= cala:151) =09at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.s= cala:112) =09at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) =09at org.apache.spark.rdd.RDD.fold(RDD.scala:1083) =09at org.apache.spark.api.java.JavaRDDLike$class.fold(JavaRDDLike.scala:41= 4) =09at org.apache.spark.api.java.AbstractJavaRDDLike.fold(JavaRDDLike.scala:= 45) =09at org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils.a= ggStable(RDDAggregateUtils.java:137) =09at org.apache.sysml.runtime.instructions.spark.AggregateUnarySPInstructi= on.processInstruction(AggregateUnarySPInstruction.java:102) =09at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleIns= truction(ProgramBlock.java:286) =09... 6 more Caused by: java.lang.NullPointerException =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshapeSparse(Lib= MatrixReorg.java:1591) =09at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reshape(LibMatrix= Reorg.java:504) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:138) =09at org.apache.sysml.runtime.instructions.spark.MatrixReshapeSPInstructio= n$RDDReshapeFunction.call(MatrixReshapeSPInstruction.java:114) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLi= ke.scala:143) =09at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) =09at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) =09at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSor= ter.scala:191) =09at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWrit= er.scala:63) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:96) =09at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scal= a:53) =09at org.apache.spark.scheduler.Task.run(Task.scala:99) =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1142) =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:617) =09... 1 more 17/07/12 12:04:47 ERROR ParWorker: Stopping LocalParWorker. {code} > Improve the robustness of sparse matrix reshape function for the Spark mo= de > -------------------------------------------------------------------------= -- > > Key: SYSTEMML-1762 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1762 > Project: SystemML > Issue Type: Bug > Components: Algorithms, ParFor, Runtime > Reporter: Fei Hu > Attachments: MNIST_Distrib_Sgd.scala > > > When running the [distributed MNIST LeNet example | https://github.com/ap= ache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],= there is a=20 > {{java.lang.NullPointerException}} error when reshaping the sparse matrix= . The involved function is {{org.apache.sysml.runtime.matrix.data.LibMatrix= Reorg#reshapeSparse}} . The reason is that the output matrix index computed= by {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBloc= kIndex}} does not exist in the {{HashMap rix}}.= =20 > To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala= }} could be used to run the distributed MNIST example. =20 > If adding some codes to ignore the null output matrix block from {{Matrix= Block out =3D rix.get(ixtmp)}}, the distributed MNIST example could run in= the Spark mode, but the result may not be right.=20 -- This message was sent by Atlassian JIRA (v6.4.14#64029)