flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bart Wyatt <bart.wy...@dsvolition.com>
Subject stream keyBy without repartition
Date Tue, 24 May 2016 14:22:37 GMT
(migrated from IRC)

Hello All,

My situation is this:

I have a large amount of data partitioned in kafka by "session" (natural partitioning).  After
I read the data, I would like to do as much as possible before incurring re-serialization
or network traffic due to the size of the data.  I am on 1.0.3 in the java API.

What I'd like to do is:

while maintaining the natural partitioning (so that a single thread can perform this) read
data from kafka, perform a window'd fold over the incoming data keyed by a _different_ field("key")
then take the product of that window'd fold and allow re-partitioning to colocate data with
equivalent keys in a new partitioning scheme where they can be reduced into a final product.
 The hope is that the products of such a windowed fold are orders of magnitude smaller than
the data that would be serialized/sent if we re-partitioned before the window'd fold.

Is there a way to .keyBy(...) such that it will act within the physical partitioning of the
data and not force a  re-partitioning of the data by that key?



COMMUNICATION intended solely for the recipient and, therefore, may not be retransmitted to
any party outside of the recipient's organization without the prior written consent of the
sender. If you have received this e-mail in error please notify the sender immediately by
telephone or reply e-mail and destroy the original message without making a copy. Deep Silver,
Inc. accepts no liability for any losses or damages resulting from infected e-mail transmissions
and viruses in e-mail attachments.

View raw message