flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
Date Tue, 02 Jun 2015 14:51:18 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569202#comment-14569202
] 

ASF GitHub Bot commented on FLINK-1319:
---------------------------------------

Github user uce commented on a diff in the pull request:

    https://github.com/apache/flink/pull/729#discussion_r31530742
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/operators/SingleInputUdfOperator.java
---
    @@ -54,8 +54,11 @@
     
     	private Map<String, DataSet<?>> broadcastVariables;
     
    +	// NOTE: only set this variable via setSemanticProperties()
    --- End diff --
    
    I think this refactoring is quite fragile. The semantic properties utility is not returning
an empty properties object, but null and you take care of setting it correctly here depending
on whether the forwarded fields have been set manually or not.
    
    If optimize is enabled and there are manual annotations, they will be overriden. I am
wondering if it is better to have manual annotations trump optimizer annotations. What's your
opinion on this?


> Add static code analysis for UDFs
> ---------------------------------
>
>                 Key: FLINK-1319
>                 URL: https://issues.apache.org/jira/browse/FLINK-1319
>             Project: Flink
>          Issue Type: New Feature
>          Components: Java API, Scala API
>            Reporter: Stephan Ewen
>            Assignee: Timo Walther
>            Priority: Minor
>
> Flink's Optimizer takes information that tells it for UDFs which fields of the input
elements are accessed, modified, or frwarded/copied. This information frequently helps to
reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently
eliminate sorts and shuffles, which are costly.
> Right now, users can add lightweight annotations to UDFs to provide this information
(such as adding {{@ConstandFields("0->3, 1, 2->1")}}.
> We worked with static code analysis of UDFs before, to determine this information automatically.
This is an incredible feature, as it "magically" makes programs faster.
> For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly
well in many cases. We used the "Soot" toolkit for the static code analysis. Unfortunately,
Soot is LGPL licensed and thus we did not include any of the code so far.
> I propose to add this functionality to Flink, in the form of a drop-in addition, to work
around the LGPL incompatibility with ALS 2.0. Users could simply download a special "flink-code-analysis.jar"
and drop it into the "lib" folder to enable this functionality. We may even add a script to
"tools" that downloads that library automatically into the lib folder. This should be legally
fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility
with ASL 2.0 is mainly in the patentability, if I remember correctly).
> Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a
code base to start with.
> *Appendix*
> Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
> Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf
and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
> Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf
(Section 6)
> Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf
(Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message