flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philippe CAPARROY <philippe.capar...@orange.fr>
Subject re: Elegantly sharing state in a streaming environment
Date Mon, 30 May 2016 12:31:55 GMT

Just transform the list in a DataStream. A datastream can be finite.

One solution, in the context of a Streaming environment is to use Kafka, or any other distributed
broker, although Flink ships with a KafkaSource.


1)Create a Kafka Topic dedicated to your list of key/values. Inject your values into this
topic, partitionned by the keys. So that you recover the keys in Flink.


2) Create a source for the stream of tuple your analysing -> output1 (Tuples).


3) Create a KafkaSource, and parse/recover your key value pairs from this source (e.g a first
map operator) : map1 -> output 2 (K,V), then :




                 a)  If you need all key/Value pairs at each operator : 
broadcast all partitions from the output 1 to the analysis operator


                  b) if you dont need all key/values pairs, just chain output1
to the analysis operator. Partitioning of K,V pairs will depend on Kafka partitioning strategy,
and can be controlled in Flink      anyway.


4) The analysis operator :  will perform a RichCoFlatMapFunction, and can be Checkpointed.

When receiving K,V pairs from output2, store them in a local state.

When receiving tuple, should be able to to filter with the help of the local state, and propagate
downstream or not.












> Message du 30/05/16 13:41
> De : leon_mclare@tutanota.com
> A : "User" 
> Copie à : 
> Objet : Elegantly sharing state in a streaming environment
>Hello Flink team,

How can i partition and share static state among instances of a streaming operator? 

I have a huge list of keys and values, which are used to filter tuples in a stream. The list
does not change. Currently i am sharing the list with each operator instance via the constructor,
although only a subset of the list is required per operator (the assignment of subset to operator
instance is known). I cannot use DataSet based functions in a streaming execution environment
to assign sub lists. I also cannot use DataStream based partitioning functions as the list
is static, i.e. not a DataStream. The dilemma exists as i am mixing static (DataSet type)
content with streaming content. Is there any other approach aside from using an additional
tool (e.g. distributed cache)?

Thanks in advance.


View raw message