systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SYSTEMML-2170) Remote parfor fails on reading ultra-sparse matrix with dims > 2G
Date Wed, 07 Mar 2018 07:23:00 GMT
Matthias Boehm created SYSTEMML-2170:
----------------------------------------

             Summary: Remote parfor fails on reading ultra-sparse matrix with dims > 2G
                 Key: SYSTEMML-2170
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2170
             Project: SystemML
          Issue Type: Bug
            Reporter: Matthias Boehm


The parfor optimizer has a rewrite to select remote spark execution type even if in the original
program there are Spark operations if these fit into the memory budget of the executors. However,
this rewrite does not check for valid integer dimensions and hence fails with 

{code}
Caused by: org.apache.sysml.runtime.DMLRuntimeException: Matrix dimensions too large for CP
runtime: 3 x 5129281161
        at org.apache.sysml.runtime.io.MatrixReader.createOutputMatrixBlock(MatrixReader.java:80)
        at org.apache.sysml.runtime.io.ReaderBinaryBlockParallel.readMatrixFromHDFS(ReaderBinaryBlockParallel.java:59)
        at org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:207)
{code}

Here is the related optimizer output
{code}
----------------------------
 EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=22)
----------------------------
--PARFOR, exec=CP, k=16, dp=NONE, tp=FIXED, rm=LOCAL_AUTOMATIC
----GENERIC (lines 122-126), exec=CP, k=1
------lix, exec=CP, k=1
------b(-), exec=CP, k=1
------b(*), exec=CP, k=1
------r(t), exec=CP, k=16
------ba(+*), exec=CP, k=16
------rix, exec=CP, k=1
------r(rshape), exec=CP, k=16
------ba(+*), exec=CP, k=16
------r(rshape), exec=CP, k=16
------rix, exec=CP, k=1
------r(rshape), exec=SPARK, k=1
------rix, exec=SPARK, k=1
------b(/), exec=CP, k=1
------u(exp), exec=CP, k=16
------b(-), exec=CP, k=1
------rix, exec=CP, k=1
------ua(maxRC), exec=CP, k=16
------ua(+RC), exec=CP, k=16
------b(*), exec=CP, k=1
------ua(+RC), exec=CP, k=16
----------------------------

18/03/06 23:17:33 DEBUG Optimizer: --- RULEBASED OPTIMIZER -------
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: Optimize w/ max_mem=24271MB/4638MB/4638MB,
max_k=16/144/144).
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: Optimize w/ SparkClusterConfig:
-- legacyVersion    = false (2.2.0)
-- confOnly         = true
-- numExecutors     = 6
-- defaultPar       = 144
-- memExecutor      = 69478645760
-- memDataMinFrac   = 0.5
-- memDataMaxFrac   = 0.6
-- memBroadcastFrac = 0.21

18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated mem (serial exec) M=109MB
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set data partitioner' - result=NONE
()
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'remove unnecessary compare matrix'
- result=false ()
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set result partitioning' - result=false
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated new mem (serial exec) M=109MB
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated new mem (serial exec, all CP)
M=109MB
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated new mem (cond partitioning) M=109MB
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set execution strategy' - result=REMOTE_SPARK
(recompile=true)
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set operation exec type CP' - result=2
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'enable data colocation' - result=false
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set partition replication factor'
- result=false
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set export replication factor'
- result=true (3)
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set degree of parallelism' - result=(see
EXPLAIN)
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set task partitioner' - result=STATIC
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set fused data partitioning and
execution' - result=false
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set transpose sparse vector operations'
- result=false
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set in-place result indexing' -
result=true ([delta_b_softmax], M=160MB)
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'disable CP caching' - result=false
(M=160MB)
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set result merge' - result=LOCAL_MEM
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set recompile memory budget' -
result=24271MB
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'remove recursive parfor' - result=0/0
18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'remove unnecessary parfor' - result=0
18/03/06 23:17:33 DEBUG OptimizationWrapper: ParFOR Opt: Optimized plan (after optimization):

----------------------------
 EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=22)
----------------------------
--PARFOR, exec=SPARK, k=3, dp=NONE, tp=STATIC, rm=LOCAL_MEM
----GENERIC (lines 122-126), exec=CP, k=1
------lix, exec=CP, k=1
------b(-), exec=CP, k=1
------b(*), exec=CP, k=1
------r(t), exec=CP, k=1
------ba(+*), exec=CP, k=1
------rix, exec=CP, k=1
------r(rshape), exec=CP, k=1
------ba(+*), exec=CP, k=1
------r(rshape), exec=CP, k=1
------rix, exec=CP, k=1
------r(rshape), exec=CP, k=1
------rix, exec=CP, k=1
------b(/), exec=CP, k=1
------u(exp), exec=CP, k=1
------b(-), exec=CP, k=1
------rix, exec=CP, k=1
------ua(maxRC), exec=CP, k=1
------ua(+RC), exec=CP, k=1
------b(*), exec=CP, k=1
------ua(+RC), exec=CP, k=1
----------------------------

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message