flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3974) enableObjectReuse fails when an operator chains to multiple downstream operators
Date Wed, 01 Nov 2017 11:46:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233964#comment-16233964
] 

ASF GitHub Bot commented on FLINK-3974:
---------------------------------------

Github user XuPingyong commented on the issue:

    https://github.com/apache/flink/pull/2110
  
    I do not agree with this pr as it always copy  StreamRecord to downstream operator.
    
    StreamMap change the input StreamRecord, so this pr works well. But many operators do
not change/reuse the input StreamRecord, like StreamFlatMap.
    
    The following code no not need the extra copy.
    `DataStream<A> input = ...
    input
        .flatmap(FlatMapFunction<A,B>...)
        .addSink(...);
    
    input
        .flatmap(FlatMapFunction<A,C>...)
        ​.addSink(...);`
    
        So I think we can change StreamMap to not reuse the input StreamRecord. And directly
send the StreamRecord if objectReuse is set true.  
        What do you think?


> enableObjectReuse fails when an operator chains to multiple downstream operators
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-3974
>                 URL: https://issues.apache.org/jira/browse/FLINK-3974
>             Project: Flink
>          Issue Type: Bug
>          Components: DataStream API
>    Affects Versions: 1.0.3
>            Reporter: B Wyatt
>            Assignee: Aljoscha Krettek
>            Priority: Major
>             Fix For: 1.1.0
>
>         Attachments: ReproFLINK3974.java, bwyatt-FLINK3974.1.patch
>
>
> Given a topology that looks like this:
> {code:java}
> DataStream<A> input = ...
> input
>     .map(MapFunction<A,B>...)
>     .addSink(...);
> input
>     .map(MapFunction<A,C>...)
>     ​.addSink(...);
> {code}
> enableObjectReuse() will cause an exception in the form of {{"java.lang.ClassCastException:
B cannot be cast to A"}} to be thrown.
> It looks like the input operator calls {{Output<StreamRecord<A>>.collect}}
which attempts to loop over the downstream operators and process them.
> However, the first map operation will call {{StreamRecord<>.replace}} which mutates
the value stored in the StreamRecord<>.  
> As a result, when the {{Output<StreamRecord<A>>.collect}} call passes the
{{StreamRecord<A>}} to the second map operation it is actually a {{StreamRecord<B>}}
and behaves as if the two map operations were serial instead of parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message