flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marchant, Hayden " <hayden.march...@citi.com>
Subject In-memory cache
Date Mon, 02 Oct 2017 10:46:08 GMT
We have an operator in our streaming application that needs to access 'reference data' that
is updated by another Flink streaming application. This reference data has about ~10,000 entries
and has a small footprint. This reference data needs to be updated ~ every 100 ms. The required
latency for  this application is extremely low ( a couple of milliseconds), and we are therefore
cautious of paying cost of I/O to access the reference data remotely. We are currently examining
3 different options for accessing this reference data:

1. Expose the reference data as QueryableState and access it directly from the 'client' streaming
operator using the QueryableState API
2. same as #1, but create an In-memory Java cache of the reference data within the operator
that is asynchronously updated at a scheduled frequency using the QueryableState API
3. Output the reference data to Redis, and create an in-memory java cache of the reference
data within the operator that is asynchronously updated at a scheduled frequency using Redis
API. 

My understanding is that one of the cons of using Queryable state, is that if the Flink application
that generates the reference data is unavailable, the Queryable state will not exist - is
that correct?

If we were to use an asynchronously scheduled 'read' from the distributed cache, where should
it be done? I was thinking of using ScheduledExecutorService from within the open method of
the Flink operator.

What is the best way to get this done?

Regards,
Hayden Marchant


Mime
View raw message