flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Esteves (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-6054) Add new state backend that dynamically stores data in memory and external storage
Date Tue, 14 Mar 2017 23:24:42 GMT
Sergio Esteves created FLINK-6054:

             Summary: Add new state backend that dynamically stores data in memory and external
                 Key: FLINK-6054
                 URL: https://issues.apache.org/jira/browse/FLINK-6054
             Project: Flink
          Issue Type: New Feature
          Components: State Backends, Checkpointing
            Reporter: Sergio Esteves
            Priority: Minor

This feature would be useful for memory-intensive applications that need to maintain state
for long periods of time; e.g., event-time streaming application with long-lived windows that
tolerate large amounts of lateness.

This feature would allow to scale the state and, in the example above, tolerate a very large
(possibly unbounded) amount of lateness, which can be useful in a set of scenarios, like the
one of Photon in the Google Advertising System (white paper: "Photon: Fault-tolerant and Scalable
Joining of Continuous Data Streams").

In a nutshell, the idea would be to have a quota for the maximum memory that a state cell
(different keys and namespaces) can occupy. When that quota gets fully occupied, new state
data would be written out to disk. Then, when state needs to be retrieved, data is read entirely
from memory - persisted data is loaded into memory in the background at the same time that
data pertaining to the quota is being fetched (this reduces I/O overhead).

Different policies, defining when to offload/load data from/to memory, can be implemented
to govern the overall memory utilization. We already have a preliminary implementation with
promising results in terms of memory savings (in the context of streaming applications with
windows that tolerate lateness).

More details are to be given soon through a design document.

This message was sent by Atlassian JIRA

View raw message