systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Glenn Weidner (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SYSTEMML-455) OOM CP transpose in Spark hybrid mode
Date Sat, 09 Sep 2017 04:23:00 GMT

     [ https://issues.apache.org/jira/browse/SYSTEMML-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Glenn Weidner updated SYSTEMML-455:
-----------------------------------
    Fix Version/s:     (was: SystemML 1.0)
                   SystemML 0.14

> OOM CP transpose in Spark hybrid mode 
> --------------------------------------
>
>                 Key: SYSTEMML-455
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-455
>             Project: SystemML
>          Issue Type: Bug
>          Components: Compiler
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>             Fix For: SystemML 0.14
>
>
> The following data generation script failed with OOM in hybrid_spark execution mode (config:
20GB driver memory), whereas the same script runs fine with the same memory budget in hybrid_mr
execution mode.
> {code}
> n = 30000;
> B = Rand (rows = n, cols = n, min = -1, max = 1, pdf = "uniform", seed = 1234);
> v = exp (Rand (rows = n, cols = 1, min = -3, max = 3, pdf = "uniform", seed = 5678));
> A = t(B) %*% (B * v);
> write(A, "./tmp/A", format="binary");
> {code}
> The resulting hop explain output is as follows:
> {code}
> # Memory Budget local/remote = 13739MB/184320MB/8602MB
> # Degree of Parallelism (vcores) local/remote = 16/120
> PROGRAM
> --MAIN PROGRAM
> ----GENERIC (lines 4-12) [recompile=true]
> ------(10) dg(rand) [30000,30000,1000,1000,900000000] [0,0,6866 -> 6866MB], CP
> ------(21) r(t) (10) [30000,30000,1000,1000,900000000] [6866,0,6866 -> 13733MB], CP
> ------(19) dg(rand) [30000,1,1000,1000,30000] [0,0,0 -> 0MB], CP
> ------(20) u(exp) (19) [30000,1,1000,1000,-1] [0,0,0 -> 0MB], CP
> ------(22) b(*) (10,20) [30000,30000,1000,1000,-1] [6867,0,6866 -> 13733MB], CP
> ------(23) ba(+*) (21,22) [30000,30000,1000,1000,-1] [13733,6866,6866 -> 27466MB],
SPARK
> ------(28) PWrite A (23) [30000,30000,1000,1000,-1] [6866,0,0 -> 6866MB], CP
> {code}
> The scripts fails at CP transpose with
> {code}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>         at org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:414)
>         at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.transposeDenseToDense(LibMatrixReorg.java:752)
>         at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.transpose(LibMatrixReorg.java:136)
>         at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reorg(LibMatrixReorg.java:105)
>         at org.apache.sysml.runtime.matrix.data.MatrixBlock.reorgOperations(MatrixBlock.java:3458)
>         at org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:129)
> {code}
> It's noteworthy that the failing cp instructions requires 13733MB at a memory budget
of 13739MB. The current guess is that Spark itself occupies substantial memory overhead which
eventually leads to the OOM - we should adjust our memory budget in Spark execution modes
to account for this overhead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message