systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Dusenberry (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation
Date Wed, 03 May 2017 21:36:04 GMT

    [ https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995732#comment-15995732
] 

Mike Dusenberry commented on SYSTEMML-1561:
-------------------------------------------

Just FYI, I'm making some progress on this.  Essentially, by rerunning static rewrites + IPA
again immediately after the initial IPA pass as a kind of "second chance", we're able to apply
this constant folding rewrite for this scenario.  This make sense because during the initial
static rewrite pass, we can't apply constant folding to the {{Hout}}, {{Wout}}, etc. DAGs
due to the leaf nodes being scalar transient reads.  After IPA with the new scalar replacement,
these DAGs will become entirely operations on literal leaf nodes, and thus eligible for constant
folding.  Then, after that second pass of static rewrites, we can benefit from IPA again by
being able to now perform scalar replacement for functions/other DAGs that consume the {{Hout}},
{{Wout}}, etc. DAGs, which are now literals.  In terms of performance, I'm seeing the execution
time cut in half (~500s faster) for SYSTEMML-1566.  I can open a PR soon.

> Improve constant folding during compilation
> -------------------------------------------
>
>                 Key: SYSTEMML-1561
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
>             Project: SystemML
>          Issue Type: Improvement
>            Reporter: Mike Dusenberry
>             Fix For: SystemML 1.0
>
>         Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around the spatial
dimensions (height and width) of the images that are stretched out into rows of the input/output
matrices.  These output dimensions are computed within the forward functions of the above
layers as small scalar equations.  From a mathematical standpoint, these sizes can be determined
at compile time, and it is nice to have these size equations in DML (v.s. hiding them inside
the engine within built-in functions).  However, we do not currently evaluate these expressions
during compilation, and thus we are left with unknown sizes even during recompilation.  This
naturally leads to max memory estimates and thus often leads to unnecessary distributed runtime
ops rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve the {{Houtc1}}
& {{Woutc1}} values that are returned from a `conv2d::forward(...)` function.  These represent
the spatial dimensions of the volume with each of the rows of the output {{outc1}} of the
function, and the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal
to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is created that
should have the same dimensions as {{outc1}}.  For the columns, if I use {{cols=ncol(outc1)}}
in this rand statement, the size will be propagated and CP ops will be compiled and run. 
I I instead use {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during recompilation,
and thus Spark ops will be compiled and run.  I have included the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} function is
inserted after the {{conv2d::forward(...)}} function that requires the {{Houtc1}} and {{Woutc1}}
variables to be supplied as arguments.  Since those latter variables are not executed during
compilation time, the max pooling sizes remain unknown, even during recompilation, and thus
Spark ops will be compiled and run.  I have included the recompile hops plan ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these scenarios
are fixed, as they are necessary for performant deep learning applications.  Note too that
this issue will be present in other non-deep learning scenarios as well.
> Mailing list thread: https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message