systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm1" <Matthias.Boe...@ibm.com>
Subject Re: Parfor optimizer getting stuck
Date Mon, 04 Sep 2017 07:46:43 GMT

thanks for sharing Rajarshi - well, this is definitely a bug and needs
fixing before our 0.15 release.

Looking over the output, there are really two different issues here:

1) The remote memory is Infinity, which causes the optimizer to go for
REMOTE_SPARK despite the unknown sizes (and max memory estimates) and
smaller degree of parallelism. This is because it mistakenly assumes all
operations could be compiled to CP as they would "fit" into remote executor
memory. We need to (a) make this decision more resilient and (b) find the
root cause of the messed up memory budget.

2) Looking over the trace of the parfor optimizer, it seems to get stuck in
"rewriteSetInPlaceResultIndexing". @Arvind: I remember that we had a
similar issue when we introduced/extended this rewrite more than a year
ago. I'll have a look into this tomorrow unless you want to handle it.

Regards,
Matthias



From:	Rajarshi Bhadra <bhadrarajarshi9@gmail.com>
To:	dev@systemml.apache.org
Cc:	Matthias Boehm1 <matthias.boehm1@ibm.com>
Date:	09/03/2017 11:52 PM
Subject:	Parfor optimizer getting stuck



Hi,

I am working on a custom tree based algorithm which I am trying to develop
using SystemML. I am using Version 1.0.0-SNAPSHOT. Now my issue is the
parfor statement is getting stuck somewhere and I am not getting any error
report or output so I am unable to determine what the issue might be.
However I have been able to get a log of the parfor which is as follows


17/09/04 06:59:12 DEBUG Optimizer: --- RULEBASED OPTIMIZER -------
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: Optimize w/
max_mem=63716MB/InfinityMB/InfinityMB, max_k=32/1/1).
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: estimated mem (serial
exec) M=52MB
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: rewrite 'set data
partitioner' - result=NONE ()
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: rewrite 'remove
unnecessary compare matrix' - result=false ()
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: rewrite 'set result
partitioning' - result=false
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: estimated new mem (serial
exec) M=52MB
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: estimated new mem (serial
exec, all CP) M=273068MB
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: estimated new mem (cond
partitioning) M=52MB
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: rewrite 'set execution
strategy' - result=REMOTE_SPARK (recompile=true
)
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: rewrite 'set operation
exec type CP' - result=198
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: rewrite 'enable data
colocation' - result=false
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: rewrite 'set partition
replication factor' - result=false
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: rewrite 'set export
replication factor' - result=true (3)
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: rewrite 'enable nested
parallelism' - result=false
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: rewrite 'set degree of
parallelism' - result=(see EXPLAIN)
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: rewrite 'set task
partitioner' - result=NAIVE
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: rewrite 'set fused data
partitioning and execution' - result=false
17/09/04 06:59:12 DEBUG Optimizer: RULEBASED OPT: rewrite 'set transpose
sparse vector operations' - result=false

It would be great if someone can help me out with this issue

Thank you


Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message