systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <>
Subject Re: SystemML optimizer design
Date Tue, 17 Jan 2017 19:19:46 GMT
Wonderful explanations Matthias! Thank you for the history.

On Jan 17, 2017 7:04 AM, "Matthias Boehm" <> wrote:

Hi Dylan,

these are very interesting questions - let me answer them one by one:

0. SPOOF: We developed the SPOOF compiler framework in a separate fork that
will be integrated back into SystemML master soon. Initially, we will add
the code generation part as an experimental feature, likely in our SystemML
1.0 release. The sum-product part will follow later because it's still in a
very early stage.

1a. Rewrites: At a high-level, there are two types of rewrites: static and
dynamic. Static rewrites are size-independent while dynamic rewrites depend
on sizes in terms of constraints or costs. During initial compilation,
intra- and inter-procedural analysis only propagates sizes that are valid
over the entire program lifetime. The rewrites are then indeed applied in
an in-place manner (i.e., "destructively"), which is ok because sizes are
guaranteed not to change. However, during dynamic recompilation, we use
exact sizes and recompile HOP DAGs very aggressively. In order to allow for
non-reversible rewrites, we keep the original HOP DAG, create a deep copy,
rewrite the copied HOP DAG and finally generate LOPs and executable
instructions. You'll find the details here:

1b. Rewrite Phase Ordering: Determining the order or rewrites, which is
often called phase ordering in compilers, is currently done manually with
the context-knowledge of side effects between individual rewrites. This
usually works very well in SystemML but gets more complicated as we add
more rewrites and we've already seen a couple of cases were phase ordering
problems led to suboptimal plans. As far as I know, there doesn't exist a
principled approach to phase ordering in other compilers like GCC or LLVM

1c. Cost-based Optimization: Right now, different components use different
cost functions and heuristics. For example, matrix multiplication chain
optimization uses the number of floating point operations, operator
selection of distributed matrix multiplications uses the I/O and shuffle
costs weighted by the degree of parallelism, other decisions use simply the
estimated size, and our resource optimizer uses a full-fledged time-based
cost model regarding generated runtime plans (see
For SPOOF, we extended this time-based cost model.

2. Explain: Yes partially, we provide a flag -explain that allows
investigating the generated plans at HOP level (-explain hops), at runtime
level (-explain runtime), and during dynamic recompilation (-explain
recompile_hops, -explain recompile_runtime). However, the HOP explain
already shows the rewritten plans. As workarounds, you can (1) set the
optimization level in SystemML-config.xml to 1 in order to see the initial
plans without rewrites, or (2) set ProgramRewriter.LDEBUG=true (and rebuild
SystemML) to see the applied rewrites. Furthermore, for task-parallel
parfor programs you can add log=DEBUG in the parfor header to see the the
plan before recompilation, after recompilation, and after rewrites along
with some details on the individually applied rewrites.

3. Relationship to Apache Calcite: Well, Calcite is a cost-based optimizer
for relational algebra. As mentioned in (0), our sum-product optimization
is still in a very early stage. In SystemML master, we purely focus on
linear algebra and statistical functions - hence, there is not much
similarity. However, it is indeed an interesting question to build our
sum-product optimizer on top of an existing rewrite framework such as
Calcite, Spark's Catalyst optimizer, or the Columbia optimizer, etc. So far
we tend to build it from scratch as our restricted linear algebra actually
simplifies a couple of rewrites.

I hope this gives a general overview - if you have further questions with
regard to a specific topic, please just ask.


On 1/17/2017 4:05 AM, Dylan Hutchison wrote:

> Hi there,
> I learned about SystemML and its optimizer from the recent SPOOF paper
> <>.  The gist I
> absorbed is that SystemML translates linear algebra expressions given by
> its DML to relational algebra, then applies standard relational algebra
> optimizations, and then re-recognizes the result in linear algebra kernels,
> with an attempt to fuse them.
> I think I found the SystemML rewrite rules here
> <
> src/main/java/org/apache/sysml/hops/rewrite>.
> A couple questions:
>    1. It appears that SystemML rewrites HOP expressions destructively,
>    i.e., by throwing away the old expression.  In this case, how does
> SystemML
>    determine the order of rewrites to apply?  Where does cost-based
>    optimization come into play?
>    2. Is there a way to "debug/visualize" the optimization process?  That
>    is, when I start with a DML program, can I view (a) the DML program
> parsed
>    into HOPs; (b) what rules fire and where in the plan, as well as the
> plan
>    after each rule fires; and (c) the lowering and fusing of operators to
> LOPs?
>    I know this is a lot to ask for; I'm curious how far SystemML has gone
>    in this direction.
>    3. Is there any relationship between the SystemML optimizer and Apache
>    Calcite <>?  If not, I'd love to understand
>    the design decisions that differentiate the two.
> Thanks, Dylan Hutchison

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message