Return-Path: X-Original-To: apmail-flink-issues-archive@minotaur.apache.org Delivered-To: apmail-flink-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 908F59837 for ; Mon, 15 Dec 2014 11:01:34 +0000 (UTC) Received: (qmail 82158 invoked by uid 500); 15 Dec 2014 11:01:34 -0000 Delivered-To: apmail-flink-issues-archive@flink.apache.org Received: (qmail 82118 invoked by uid 500); 15 Dec 2014 11:01:34 -0000 Mailing-List: contact issues-help@flink.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.incubator.apache.org Delivered-To: mailing list issues@flink.incubator.apache.org Received: (qmail 82109 invoked by uid 99); 15 Dec 2014 11:01:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Dec 2014 11:01:34 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 15 Dec 2014 11:01:33 +0000 Received: (qmail 82072 invoked by uid 99); 15 Dec 2014 11:01:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Dec 2014 11:01:13 +0000 Date: Mon, 15 Dec 2014 11:01:13 +0000 (UTC) From: "Fabian Hueske (JIRA)" To: issues@flink.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-1259) FilterFunction can modify data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/FLINK-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246554#comment-14246554 ] Fabian Hueske commented on FLINK-1259: -------------------------------------- Yes, let's add it to the documentation for now. If we find that many users run into this problem, we can integrate it into the object non-reuse mode. > FilterFunction can modify data > ------------------------------ > > Key: FLINK-1259 > URL: https://issues.apache.org/jira/browse/FLINK-1259 > Project: Flink > Issue Type: Bug > Components: Java API, Optimizer, Scala API > Affects Versions: 0.7.0-incubating > Reporter: Fabian Hueske > > The FilterFunction returns a boolean for an input record which determines whether the record is filtered or not. > However, the function can also modify the input record which has effects if the record is not filtered. > The optimizer assumes that the data is not changed by a FilterFunction, i.e., it assumes that a Filter preserves physical data properties (orders, partitionings, etc.) and might also be pushed down in the future. These assumptions can result in semantically incorrect programs, if the function actually changes its incoming records. > Possible solutions are: > - document the requirements (and hope that users read it and behave nicely) > - hand a copy to the function which can be modified but is not passed on. This has major performance implications and might confuse users as changes are invalidated. However, this could also be integrated with the mutable/immutable runtime switch (FLINK-1005) -- This message was sent by Atlassian JIRA (v6.3.4#6332)