flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <ches...@apache.org>
Subject Re: Maintaining Stream Partitioning after Mapping?
Date Sat, 15 Apr 2017 11:22:14 GMT

I think if you have multiple keyBy() transformations with identical 
parallelism the partitioning should
be "preserved". The second keyBy() will still go through the 
partitioning process, but since both the key
and parallelism are identical the resulting partition should be 
identical as well. resulting in no data being
shuffled around. We aren't really preserving the partitioning, but 
re-creating the original one.


On 12.04.2017 21:37, Ryan Conway wrote:
> Greetings,
> Is there a means of maintaining a stream's partitioning after running 
> it through an operation such as map or filter?
> I have a pipeline stage S that operates on a stream partitioned by an 
> ID field. S flat maps objects of type A to type B, which both have an 
> "ID" field, and where each instance of B that S outputs has the same 
> ID as its input instance of A. I hope to add a pipeline stage T 
> immediately after S that operates using the same partitioning as S, so 
> that I can avoid the expense of re-keying the instances of type B.
> If I am understanding the DataStream API correctly this is not 
> feasible with Flink, as map(), filter() etc. all 
> output SingleOutputStreamOperator. But I am hoping that I am missing 
> something.
> Thank you,
> Ryan

View raw message