flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyf...@apache.org>
Subject Streaming KV store abstraction
Date Tue, 08 Sep 2015 11:35:05 GMT
Hey All,

The last couple of days I have been playing around with the idea of
building a streaming key-value store abstraction using stateful streaming
operators that can be used within Flink Streaming programs seamlessly.

Operations executed on this KV store would be fault tolerant as it
integrates with the checkpointing mechanism, and if we add timestamps to
each put/get/... operation we can use the watermarks to create fully
deterministic results. This functionality is very useful for many
applications, and is very hard to implement properly with some dedicates kv

The KVStore abstraction could look as follows:

KVStore<K,V> store = new KVStore<>;


store.get(DataStream<K>) -> DataStream<KV<K,V>>
store.remove(DataStream<K>) -> DataStream<KV<K,V>>
store.multiGet(DataStream<K[]>) -> DataStream<KV<K,V>[]>
store.getWithKeySelector(DataStream<X>, KeySelector<X,K>) ->

For the resulting streams I used a special KV abstraction which let's us
return null values.

The implementation uses a simple streaming operator for executing most of
the operations (for multi get there is an additional merge operator) with
either local or partitioned states for storing the kev-value pairs (my
current prototype uses local states). And it can either execute operations
eagerly (which would not provide deterministic results), or buffer up
operations and execute them in order upon watermarks.

As for use cases you can probably come up with many I will save that for
now :D

I have a prototype implementation here that can execute the operations
described above (does not handle watermarks and time yet):

And also an example job:


What do you think?
If you like it I will work on writing tests and it still needs a lot of
tweaking and refactoring. This might be something we want to include with
the standard streaming libraries at one point.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message