Return-Path: X-Original-To: apmail-flink-issues-archive@minotaur.apache.org Delivered-To: apmail-flink-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7498F17C01 for ; Mon, 1 Jun 2015 12:15:18 +0000 (UTC) Received: (qmail 2516 invoked by uid 500); 1 Jun 2015 12:15:18 -0000 Delivered-To: apmail-flink-issues-archive@flink.apache.org Received: (qmail 2479 invoked by uid 500); 1 Jun 2015 12:15:18 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 2470 invoked by uid 99); 1 Jun 2015 12:15:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2015 12:15:18 +0000 Date: Mon, 1 Jun 2015 12:15:18 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-1319) Add static code analysis for UDFs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567225#comment-14567225 ] ASF GitHub Bot commented on FLINK-1319: --------------------------------------- Github user uce commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-107419997 OK, I will review this. I vote to stick to Stephan's suggested approach instead of package based exclusions: analyze everything and allow exclusions with a `@SkipCodeAnalysis` annotation. Any further opinions on the output of the analysis (stdout vs. logging question)? > Add static code analysis for UDFs > --------------------------------- > > Key: FLINK-1319 > URL: https://issues.apache.org/jira/browse/FLINK-1319 > Project: Flink > Issue Type: New Feature > Components: Java API, Scala API > Reporter: Stephan Ewen > Assignee: Timo Walther > Priority: Minor > > Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. > Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields("0->3, 1, 2->1")}}. > We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it "magically" makes programs faster. > For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the "Soot" toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. > I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special "flink-code-analysis.jar" and drop it into the "lib" folder to enable this functionality. We may even add a script to "tools" that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). > Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. > *Appendix* > Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ > Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf > Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) > Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)