flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: [jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
Date Sat, 07 Feb 2015 23:02:24 GMT
Timo, thanks for picking up this very cool feature!
I think as well that an integrated approach would be the better solution,
if it can be done with reasonable effort.

+1 implementing a prototype using ASM.
Let me know, if I can help somehow.

Cheers, Fabian

2015-02-05 14:31 GMT+01:00 Timo Walther (JIRA) <jira@apache.org>:

>
>     [
> https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307203#comment-14307203
> ]
>
> Timo Walther commented on FLINK-1319:
> -------------------------------------
>
> Actually, I don't like the "drop-in" approach. I think it would be much
> better if the code analysis can be included in the release. Especially once
> the code is stable enough, it would be great to enable it by default and
> speed up jobs automatically.
>
> I did some research about other frameworks we could use instead. Soot is
> the best framework, however, I think we can also build the code analysis on
> top of the ObjectWeb ASM library[1]. It provides some functionality for
> data flow analysis[2]. The examples for BasicInterpreter and BasicVerifier
> look promising. Other projects use it for determine types[3].
>
> Using ASM requires us to implement more but it gives us full flexibility
> for further analysis use cases.
>
> I would try implement a simple proof-of-concept prototype. What do you
> think?
>
> [1] http://asm.ow2.org/
> [2] http://download.forge.objectweb.org/asm/asm4-guide.pdf, 115ff
> [3]
> https://github.com/hraberg/enumerable/blob/master/src/main/java/org/enumerable/lambda/support/expression/ExpressionInterpreter.java
>
> > Add static code analysis for UDFs
> > ---------------------------------
> >
> >                 Key: FLINK-1319
> >                 URL: https://issues.apache.org/jira/browse/FLINK-1319
> >             Project: Flink
> >          Issue Type: New Feature
> >          Components: Java API, Scala API
> >            Reporter: Stephan Ewen
> >            Assignee: Timo Walther
> >            Priority: Minor
> >
> > Flink's Optimizer takes information that tells it for UDFs which fields
> of the input elements are accessed, modified, or frwarded/copied. This
> information frequently helps to reuse partitionings, sorts, etc. It may
> speed up programs significantly, as it can frequently eliminate sorts and
> shuffles, which are costly.
> > Right now, users can add lightweight annotations to UDFs to provide this
> information (such as adding {{@ConstandFields("0->3, 1, 2->1")}}.
> > We worked with static code analysis of UDFs before, to determine this
> information automatically. This is an incredible feature, as it "magically"
> makes programs faster.
> > For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross),
> this works surprisingly well in many cases. We used the "Soot" toolkit for
> the static code analysis. Unfortunately, Soot is LGPL licensed and thus we
> did not include any of the code so far.
> > I propose to add this functionality to Flink, in the form of a drop-in
> addition, to work around the LGPL incompatibility with ALS 2.0. Users could
> simply download a special "flink-code-analysis.jar" and drop it into the
> "lib" folder to enable this functionality. We may even add a script to
> "tools" that downloads that library automatically into the lib folder. This
> should be legally fine, since we do not redistribute LGPL code and only
> dynamically link it (the incompatibility with ASL 2.0 is mainly in the
> patentability, if I remember correctly).
> > Prior work on this has been done by [~aljoscha] and [~skunert], which
> could provide a code base to start with.
> > *Appendix*
> > Hompage to Soot static analysis toolkit:
> http://www.sable.mcgill.ca/soot/
> > Papers on static analysis and for optimization:
> http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf
> and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
> > Quick introduction to the Optimizer:
> http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf
> (Section 6)
> > Optimizer for Iterations:
> http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf
> (Sections 4.3 and 5.3)
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message