flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rong Rong <walter...@gmail.com>
Subject Re: What's the advantage of using BroadcastState?
Date Sun, 19 Aug 2018 15:30:07 GMT
Hi Paul,

To add to Hequn's answer. Broadcast state can typically be used as "a
low-throughput stream containing a set of rules which we want to evaluate
against all elements coming from another stream" [1]
So to add to the difference list is: whether it is "broadcast" across all
keys if processing a keyed stream. This is typically when it is not
possible to derive same key field using KeySelector in CoStream.
Another additional difference is performance: BroadcastStream is "stored
locally and is used to process all incoming elements on the other stream"
thus requires to carefully manage the size of the BroadcastStream.

[1]:
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/broadcast_state.html

On Sun, Aug 19, 2018 at 1:40 AM Hequn Cheng <chenghequn@gmail.com> wrote:

> Hi Paul,
>
> There are some differences:
> 1. The BroadcastStream can broadcast data for you, i.e, data will be
> broadcasted to all downstream tasks automatically.
> 2. To guarantee that the contents in the Broadcast State are the same
> across all parallel instances of our operator, read-write access is only
> given to the broadcast side
> 3. For BroadcastState, flink guarantees that upon restoring/rescaling
> there will be no duplicates and no missing data. In case of recovery with
> the same or smaller parallelism, each task reads its checkpointed state.
> Upon scaling up, each task reads its own state, and the remaining tasks
> (p_new-p_old) read checkpoints of previous tasks in a round-robin manner.
> While MapState doesn't have such abilities.
>
> Best, Hequn
>
> On Sun, Aug 19, 2018 at 11:18 AM, Paul Lam <paullin3280@gmail.com> wrote:
>
>> Hi,
>>
>> AFAIK, the difference between a BroadcastStream and a normal DataStream
>> is that the BroadcastStream is with a BroadcastState, but it seems that the
>> functionality of BroadcastState can also be achieved by MapState in a
>> CoMapFunction or something since the control stream is still broadcasted
>> without being turned into BroadcastStream. So, I’m wondering what’s the
>> advantage of using BroadcastState? Thanks a lot!
>>
>> Best Regards,
>> Paul Lam
>>
>
>

Mime
View raw message