systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Dusenberry (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SYSTEMML-1774) Improve Parfor parallelism for deep learning
Date Tue, 18 Jul 2017 22:09:00 GMT

    [ https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092262#comment-16092262
] 

Mike Dusenberry edited comment on SYSTEMML-1774 at 7/18/17 10:08 PM:
---------------------------------------------------------------------

[~mboehm7]  Okay, thanks.  I would have thought that Spark execution mode would force the
parfor op to be run as a distributed spark operation, and thus the bodies would be forced
to CP operations running on each worker.  Sounds like it is the opposite: the parfor runs
as a local CP parfor, and the bodies consist of Spark ops if possible.  That should probably
be noted somewhere.

Do you have any thoughts as to why the SPARK execution mode is 3x faster than the HYBRID_SPARK
execution mode?


was (Author: mwdusenb@us.ibm.com):
[~mboehm7]  Okay, thanks.  I would have thought that Spark execution mode would force the
parfor op to be run as a distributed spark operation, and thus the bodies would be forced
to CP operations running on each worker.  Sounds like it is the opposite: the parfor runs
as a local CP parfor, and the bodies consist of Spark ops if possible.

Do you have any thoughts as to why the SPARK execution mode is 3x faster than the HYBRID_SPARK
execution mode?

> Improve Parfor parallelism for deep learning
> --------------------------------------------
>
>                 Key: SYSTEMML-1774
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Algorithms, Compiler, ParFor
>    Affects Versions: SystemML 1.0
>            Reporter: Fei Hu
>              Labels: deeplearning
>         Attachments: Explain_For_HYBRID_SPARK_Mode_With_ErrorInfo.txt, Explain_For_Spark_Mode.txt,
MNIST_Distrib_Sgd.scala, mnist_lenet_distrib_sgd.dml
>
>
> When running the  [distributed MNIST LeNet example | https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
each mini-batch could ideally run in parallel without interaction. We try to force {{parfor
(j in 1:parallel_batches)}} at line 137 of {{nn/examples/mnist_lenet_distrib_sgd.dml}} to
be {{parfor (j in 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use {{REMOTE_SPARK}}
mode, but got some errors about {{org.apache.sysml.runtime.DMLRuntimeException: Not supported:
Instructions of type other than CP instructions}} using the mode {{SPARK}}, and the error
{{java.lang.NullPointerException}} using the mode {{HYBRID_SPARK}}. More log information can
be found at the following comments. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message