flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Conway <ryanmackenziecon...@gmail.com>
Subject Re: Maintaining Stream Partitioning after Mapping?
Date Mon, 17 Apr 2017 22:10:46 GMT
Thank you, Chesnay. My hope is to keep things computationally inexpensive,
and if I understand you correctly, that is satisfied even with this
rekeying.

Ryan

On Sat, Apr 15, 2017 at 4:22 AM, Chesnay Schepler <chesnay@apache.org>
wrote:

> Hello,
>
> I think if you have multiple keyBy() transformations with identical
> parallelism the partitioning should
> be "preserved". The second keyBy() will still go through the partitioning
> process, but since both the key
> and parallelism are identical the resulting partition should be identical
> as well. resulting in no data being
> shuffled around. We aren't really preserving the partitioning, but
> re-creating the original one.
>
> Regards,
> Chesnay
>
>
> On 12.04.2017 21:37, Ryan Conway wrote:
>
>> Greetings,
>>
>> Is there a means of maintaining a stream's partitioning after running it
>> through an operation such as map or filter?
>>
>> I have a pipeline stage S that operates on a stream partitioned by an ID
>> field. S flat maps objects of type A to type B, which both have an "ID"
>> field, and where each instance of B that S outputs has the same ID as its
>> input instance of A. I hope to add a pipeline stage T immediately after S
>> that operates using the same partitioning as S, so that I can avoid the
>> expense of re-keying the instances of type B.
>>
>> If I am understanding the DataStream API correctly this is not feasible
>> with Flink, as map(), filter() etc. all output SingleOutputStreamOperator.
>> But I am hoping that I am missing something.
>>
>> Thank you,
>> Ryan
>>
>
>
>

Mime
View raw message