systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fei Hu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SYSTEMML-1774) Improve Parfor parallelism for deep learning
Date Mon, 17 Jul 2017 19:11:01 GMT

    [ https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090338#comment-16090338
] 

Fei Hu edited comment on SYSTEMML-1774 at 7/17/17 7:10 PM:
-----------------------------------------------------------

cc [~mboehm7]  and [~dusenberrymw] Could you help check if my understanding about this issue
is right?


was (Author: tenma):
cc [~mboehm7]  Could you help check if my understanding about this issue is right?

> Improve Parfor parallelism for deep learning
> --------------------------------------------
>
>                 Key: SYSTEMML-1774
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Algorithms, Compiler, ParFor
>    Affects Versions: SystemML 1.0
>            Reporter: Fei Hu
>              Labels: deeplearning
>
> When running the  [distributed MNIST LeNet example | https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
each mini-batch could ideally run in parallel without interaction. We try to force {{parfor
(j in 1:parallel_batches)}} at line 137 of {{nn/examples/mnist_lenet_distrib_sgd.dml}} to
be {{parfor (j in 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use {{REMOTE_SPARK}}
mode, but got some errors about {{org.apache.sysml.runtime.DMLRuntimeException: Not supported:
Instructions of type other than CP instructions}}. More log information can be found at the
following comments. One example of the errors is that at the convolutional layer, we need
to randomly generate some matrixes, but SystemML choose {{RandSPInstruction}} instead of {{DataGenCPInstruction}},
which may be because SystemML could not determine the row number of the matrix. For this distributed
MNIST LeNet  example, using CPInstruction may achieve better performance. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message