systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SYSTEMML-2446) Paramserv adagrad ASP batch disjoint_continuous failing
Date Tue, 17 Jul 2018 05:05:00 GMT

    [ https://issues.apache.org/jira/browse/SYSTEMML-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546026#comment-16546026
] 

Matthias Boehm edited comment on SYSTEMML-2446 at 7/17/18 5:04 AM:
-------------------------------------------------------------------

[~Guobao] could you please have a look into this issue? It only shows up with ASP and adagrad
- the special characteristic is that the sizes of the model and gradients are different (16
vs 8). It might be an issue of how and when we all {{ParamservUtils.cleanupListObject}} -
if I disable this method, it runs but very slowly because the missing cleanup leads to evictions.



was (Author: mboehm7):
[~Guobao] could you please have a look into this issue? It only shows up with ASP and adagrad
- the special characteristics is that the sizes of the model and gradients are different (16
vs 8). It might be an issue of how and when we all {{ParamservUtils.cleanupListObject}} -
if I disable this it runs but very slowly because the missing cleanup leads to evictions.


> Paramserv adagrad ASP batch disjoint_continuous failing
> -------------------------------------------------------
>
>                 Key: SYSTEMML-2446
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2446
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: Matthias Boehm
>            Priority: Major
>
> {code}
> Caused by: java.io.IOException: File scratch_space/_p152255_9.1.44.68/_t0/temp10100_7141
does not exist on HDFS/LFS.
>         at org.apache.sysml.runtime.io.MatrixReader.checkValidInputFile(MatrixReader.java:120)
>         at org.apache.sysml.runtime.io.ReaderBinaryCell.readMatrixFromHDFS(ReaderBinaryCell.java:51)
>         at org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:197)
>         at org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:164)
>         at org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromHDFS(MatrixObject.java:434)
>         at org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromHDFS(MatrixObject.java:59)
>         at org.apache.sysml.runtime.controlprogram.caching.CacheableData.readBlobFromHDFS(CacheableData.java:886)
>         at org.apache.sysml.runtime.controlprogram.caching.CacheableData.acquireReadIntern(CacheableData.java:434)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message