flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elias Levy <fearsome.lucid...@gmail.com>
Subject Re-keying / sub-keying a stream without repartitioning
Date Sat, 22 Apr 2017 05:15:44 GMT
This is something that has come up before on the list, but in a different
context.  I have a need to rekey a stream but would prefer the stream to
not be repartitioned.  There is no gain to repartitioning, as the new
partition key is a composite of the stream key, going from a key of A to a
key of (A, B), so all values for the resulting streams are already being
rerouted to the same node and repartitioning them to other nodes would
simply generate unnecessary network traffic and serde overhead.

Unlike previous use cases, I am not trying to perform aggregate
operations.  Instead I am executing CEP patterns.  Some patterns apply the
the stream keyed by A and some on the stream keyed by (A,B).

The API does not appear to have an obvious solution to this situation.
keyBy() will repartition and there is isn't something like subKey() to
subpartion a stream without repartitioning (e.g. keyBy(A).subKey(B)).

I suppose I could accomplish it by using partitionCustom(), ignoring the
second element in the key, and delegating to the default partitioner
passing it only the first element, thus resulting in no change of task


View raw message