flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <ches...@apache.org>
Subject Re: stream of large objects
Date Sun, 10 Feb 2019 09:56:50 GMT
The Broadcast State 
<https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/state/broadcast_state.html#the-broadcast-state-pattern>

may be interesting to you.

On 08.02.2019 15:57, Aggarwal, Ajay wrote:
>
> Yes, another KeyBy will be used. The “small size” messages will be 
> strings of length 500 to 1000.
>
> Is there a concept of “global” state in flink? Is it possible to keep 
> these lists in global state and only pass the list reference (by 
> name?) in the LargeMessage?
>
> *From: *Chesnay Schepler <chesnay@apache.org>
> *Date: *Friday, February 8, 2019 at 8:45 AM
> *To: *"Aggarwal, Ajay" <Ajay.Aggarwal@netapp.com>, 
> "user@flink.apache.org" <user@flink.apache.org>
> *Subject: *Re: stream of large objects
>
> Whether a LargeMessage is serialized depends on how the job is structured.
> For example, if you were to only apply map/filter functions after the 
> aggregation it is likely they wouldn't be serialized.
> If you were to apply another keyBy they will be serialized again.
>
> When you say "small size" messages, what are we talking about here?
>
> On 07.02.2019 20:37, Aggarwal, Ajay wrote:
>
>     In my use case my source stream contain small size messages, but
>     as part of flink processing I will be aggregating them into large
>     messages and further processing will happen on these large
>     messages. The structure of this large message will be something
>     like this:
>
>        Class LargeMessage {
>
>           String key
>
>            List <String> messages; // this is where the aggregation of
>     smaller messages happen
>
>        }
>
>     In some cases this list field of LargeMessage can get very large
>     (1000’s of messages). Is it ok to create an intermediate stream of
>     these LargeMessages? What should I be concerned about while
>     designing the flink job? Specifically with parallelism in mind. As
>     these LargeMessages flow from one flink subtask to another, do
>     they get serialized/deserialized ?
>
>     Thanks.
>


Mime
View raw message