systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (SYSTEMML-2170) Remote parfor fails on reading ultra-sparse matrix with dims > 2G
Date Thu, 08 Mar 2018 20:45:00 GMT

     [ https://issues.apache.org/jira/browse/SYSTEMML-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matthias Boehm closed SYSTEMML-2170.
------------------------------------
       Resolution: Fixed
         Assignee: Matthias Boehm
    Fix Version/s: SystemML 1.1

> Remote parfor fails on reading ultra-sparse matrix with dims > 2G
> -----------------------------------------------------------------
>
>                 Key: SYSTEMML-2170
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2170
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>            Priority: Major
>             Fix For: SystemML 1.1
>
>
> The parfor optimizer has a rewrite to select remote spark execution type even if in the
original program there are Spark operations if these fit into the memory budget of the executors.
However, this rewrite does not check for valid integer dimensions and hence fails with 
> {code}
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Matrix dimensions too large
for CP runtime: 3 x 5129281161
>         at org.apache.sysml.runtime.io.MatrixReader.createOutputMatrixBlock(MatrixReader.java:80)
>         at org.apache.sysml.runtime.io.ReaderBinaryBlockParallel.readMatrixFromHDFS(ReaderBinaryBlockParallel.java:59)
>         at org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:207)
> {code}
> Here is the related optimizer output
> {code}
> ----------------------------
>  EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=22)
> ----------------------------
> --PARFOR, exec=CP, k=16, dp=NONE, tp=FIXED, rm=LOCAL_AUTOMATIC
> ----GENERIC (lines 122-126), exec=CP, k=1
> ------lix, exec=CP, k=1
> ------b(-), exec=CP, k=1
> ------b(*), exec=CP, k=1
> ------r(t), exec=CP, k=16
> ------ba(+*), exec=CP, k=16
> ------rix, exec=CP, k=1
> ------r(rshape), exec=CP, k=16
> ------ba(+*), exec=CP, k=16
> ------r(rshape), exec=CP, k=16
> ------rix, exec=CP, k=1
> ------r(rshape), exec=SPARK, k=1
> ------rix, exec=SPARK, k=1
> ------b(/), exec=CP, k=1
> ------u(exp), exec=CP, k=16
> ------b(-), exec=CP, k=1
> ------rix, exec=CP, k=1
> ------ua(maxRC), exec=CP, k=16
> ------ua(+RC), exec=CP, k=16
> ------b(*), exec=CP, k=1
> ------ua(+RC), exec=CP, k=16
> ----------------------------
> 18/03/06 23:17:33 DEBUG Optimizer: --- RULEBASED OPTIMIZER -------
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: Optimize w/ max_mem=24271MB/4638MB/4638MB,
max_k=16/144/144).
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: Optimize w/ SparkClusterConfig:
> -- legacyVersion    = false (2.2.0)
> -- confOnly         = true
> -- numExecutors     = 6
> -- defaultPar       = 144
> -- memExecutor      = 69478645760
> -- memDataMinFrac   = 0.5
> -- memDataMaxFrac   = 0.6
> -- memBroadcastFrac = 0.21
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated mem (serial exec) M=109MB
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set data partitioner' - result=NONE
()
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'remove unnecessary compare
matrix' - result=false ()
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set result partitioning' -
result=false
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated new mem (serial exec) M=109MB
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated new mem (serial exec, all
CP) M=109MB
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: estimated new mem (cond partitioning)
M=109MB
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set execution strategy' -
result=REMOTE_SPARK (recompile=true)
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set operation exec type CP'
- result=2
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'enable data colocation' -
result=false
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set partition replication
factor' - result=false
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set export replication factor'
- result=true (3)
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set degree of parallelism'
- result=(see EXPLAIN)
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set task partitioner' - result=STATIC
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set fused data partitioning
and execution' - result=false
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set transpose sparse vector
operations' - result=false
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set in-place result indexing'
- result=true ([delta_b_softmax], M=160MB)
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'disable CP caching' - result=false
(M=160MB)
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set result merge' - result=LOCAL_MEM
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'set recompile memory budget'
- result=24271MB
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'remove recursive parfor' -
result=0/0
> 18/03/06 23:17:33 DEBUG Optimizer: RULEBASED OPT: rewrite 'remove unnecessary parfor'
- result=0
> 18/03/06 23:17:33 DEBUG OptimizationWrapper: ParFOR Opt: Optimized plan (after optimization):
> ----------------------------
>  EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=22)
> ----------------------------
> --PARFOR, exec=SPARK, k=3, dp=NONE, tp=STATIC, rm=LOCAL_MEM
> ----GENERIC (lines 122-126), exec=CP, k=1
> ------lix, exec=CP, k=1
> ------b(-), exec=CP, k=1
> ------b(*), exec=CP, k=1
> ------r(t), exec=CP, k=1
> ------ba(+*), exec=CP, k=1
> ------rix, exec=CP, k=1
> ------r(rshape), exec=CP, k=1
> ------ba(+*), exec=CP, k=1
> ------r(rshape), exec=CP, k=1
> ------rix, exec=CP, k=1
> ------r(rshape), exec=CP, k=1
> ------rix, exec=CP, k=1
> ------b(/), exec=CP, k=1
> ------u(exp), exec=CP, k=1
> ------b(-), exec=CP, k=1
> ------rix, exec=CP, k=1
> ------ua(maxRC), exec=CP, k=1
> ------ua(+RC), exec=CP, k=1
> ------b(*), exec=CP, k=1
> ------ua(+RC), exec=CP, k=1
> ----------------------------
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message