flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1259) FilterFunction can modify data
Date Mon, 15 Dec 2014 11:01:13 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246554#comment-14246554

Fabian Hueske commented on FLINK-1259:

Yes, let's add it to the documentation for now.
If we find that many users run into this problem, we can integrate it into the object non-reuse

> FilterFunction can modify data
> ------------------------------
>                 Key: FLINK-1259
>                 URL: https://issues.apache.org/jira/browse/FLINK-1259
>             Project: Flink
>          Issue Type: Bug
>          Components: Java API, Optimizer, Scala API
>    Affects Versions: 0.7.0-incubating
>            Reporter: Fabian Hueske
> The FilterFunction returns a boolean for an input record which determines whether the
record is filtered or not. 
> However, the function can also modify the input record which has effects if the record
is not filtered.
> The optimizer assumes that the data is not changed by a FilterFunction, i.e., it assumes
that a Filter preserves physical data properties (orders, partitionings, etc.) and might also
be pushed down in the future. These assumptions can result in semantically incorrect programs,
if the function actually changes its incoming records.
> Possible solutions are:
> - document the requirements (and hope that users read it and behave nicely)
> - hand a copy to the function which can be modified but is not passed on. This has major
performance implications and might confuse users as changes are invalidated. However, this
could also be integrated with the mutable/immutable runtime switch (FLINK-1005)

This message was sent by Atlassian JIRA

View raw message