flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marta Paes Moreira <ma...@ververica.com>
Subject Re: Task Assignment
Date Mon, 27 Apr 2020 07:28:27 GMT
Sorry — I didn't understand you were dealing with multiple keys.

In that case, I'd recommend you read about key-group assignment [1] and
check the KeyGroupRangeAssignment class [2].

Key-groups are assigned to parallel tasks as ranges before the job is
started — this is also a well-defined behaviour in Flink, with implications
in state reassignment on rescaling. I'm afraid that if you try to hardwire
this behaviour into your code, the job might not be transparently
rescalable anymore.

[1] https://flink.apache.org/features/2017/07/04/flink-rescalable-state.html

On Fri, Apr 24, 2020 at 7:10 AM Navneeth Krishnan <reachnavneeth2@gmail.com>

> Hi Marta,
> Thanks for you response. What I'm looking for is something like data
> localization. If I have one TM which is processing a set of keys, I want to
> ensure all keys of the same type goes to the same TM rather than using
> hashing to find the downstream slot. I could use a common key to do this
> but I would have to parallelize as much as possible since the number of
> incoming messages is too large to narrow down to a single key and
> processing it.
> Thanks
> On Thu, Apr 23, 2020 at 2:02 AM Marta Paes Moreira <marta@ververica.com>
> wrote:
>> Hi, Navneeth.
>> If you *key* your stream using stream.keyBy(…), this will logically
>> split your input and all the records with the same key will be processed in
>> the same operator instance. This is the default behavior in Flink for keyed
>> streams and transparently handled.
>> You can read more about it in the documentation [1].
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#keyed-state-and-operator-state
>> On Thu, Apr 23, 2020 at 7:44 AM Navneeth Krishnan <
>> reachnavneeth2@gmail.com> wrote:
>>> Hi All,
>>> Is there a way for an upstream operator to know how the downstream
>>> operator tasks are assigned? Basically I want to group my messages to be
>>> processed on slots in the same node based on some key.
>>> Thanks

View raw message