flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Proposal for Flink Issue/Subproject - Static Code Analysis
Date Wed, 10 Dec 2014 16:52:39 GMT
Hi all!

If anyone is interested in heading a very cool issue (or almost a
standalone subproject), have a look here:

Stephan

-----

https://issues.apache.org/jira/browse/FLINK-1319
Description

Flink's Optimizer takes information that tells it for UDFs which fields of
the input elements are accessed, modified, or forwarded/copied. This
information frequently helps to reuse partitionings, sorts, etc. It may
speed up programs significantly, as it can frequently eliminate sorts and
shuffles, which are costly.

Right now, users can add lightweight annotations to UDFs to provide this
information (such as adding @ConstandFields("0->3, 1, 2->1").

We worked with static code analysis of UDFs before, to determine this
information automatically. This is an incredible feature, as it "magically"
makes programs faster.

For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this
works surprisingly well in many cases. We used the "Soot" toolkit for the
static code analysis. Unfortunately, Soot is LGPL licensed and thus we did
not include any of the code so far.

I propose to add this functionality to Flink, in the form of a drop-in
addition, to work around the LGPL incompatibility with ALS 2.0. Users could
simply download a special "flink-code-analysis.jar" and drop it into the
"lib" folder to enable this functionality. We may even add a script to
"tools" that downloads that library automatically into the lib folder. This
should be legally fine, since we do not redistribute LGPL code and only
dynamically link it (the incompatibility with ASL 2.0 is mainly in the
patentability, if I remember correctly).

Prior work on this has been done by Aljoscha Krettek
<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=aljoscha>
and Sebastian
Kunert <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=skunert>,
which could provide a code base to start with.

*Appendix*

Hompage for Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/

Papers on static analysis and for optimization:
http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf
 and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf

Quick introduction to the Optimizer:
http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf
(Section
6)

Optimizer for Iterations:
http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf
(Sections
4.3 and 5.3)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message