systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <>
Subject [jira] [Commented] (SYSTEMML-1774) Improve Parfor parallelism for deep learning
Date Wed, 19 Jul 2017 01:39:00 GMT


Matthias Boehm commented on SYSTEMML-1774:

here are a couple of guesses: (1) the expensive operations are still ran in CP because distributed
operations are globally disabled for any convolution ops (because they are experimental),
(2) running concurrent spark operations fully exploits your cluster and not just a single
node, and (3) potentially fewer evictions, given the very small driver and sparks lazy evaluation.
I can spent a couple of hours later this week this to profile this.

> Improve Parfor parallelism for deep learning
> --------------------------------------------
>                 Key: SYSTEMML-1774
>                 URL:
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Algorithms, Compiler, ParFor
>    Affects Versions: SystemML 1.0
>            Reporter: Fei Hu
>              Labels: deeplearning
>         Attachments: Explain_For_HYBRID_SPARK_Mode_With_ErrorInfo.txt, Explain_For_Spark_Mode.txt,
MNIST_Distrib_Sgd.scala, mnist_lenet_distrib_sgd.dml
> When running the  [distributed MNIST LeNet example |],
each mini-batch could ideally run in parallel without interaction. We try to force {{parfor
(j in 1:parallel_batches)}} at line 137 of {{nn/examples/mnist_lenet_distrib_sgd.dml}} to
be {{parfor (j in 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use {{REMOTE_SPARK}}
mode, but got some errors about {{org.apache.sysml.runtime.DMLRuntimeException: Not supported:
Instructions of type other than CP instructions}} using the mode {{SPARK}}, and the error
{{java.lang.NullPointerException}} using the mode {{HYBRID_SPARK}}. More log information can
be found at the following comments. 

This message was sent by Atlassian JIRA

View raw message