flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Qin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-4266) Remote Storage Statebackend
Date Sat, 06 Aug 2016 04:14:20 GMT

     [ https://issues.apache.org/jira/browse/FLINK-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chen Qin updated FLINK-4266:
    Summary: Remote Storage Statebackend  (was: Cassandra StateBackend)

> Remote Storage Statebackend
> ---------------------------
>                 Key: FLINK-4266
>                 URL: https://issues.apache.org/jira/browse/FLINK-4266
>             Project: Flink
>          Issue Type: New Feature
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.0.3, 1.2.0
>            Reporter: Chen Qin
>            Priority: Minor
> Current FileSystem statebackend limits whole state size to disk space. 
> For long running task that hold window content for long period of time, it needs to split
out states to durable remote storage and replicated across data centers.
> We look into implementation from leverage checkpoint timestamp as versioning and do range
query to get current state; we also want to reduce "hot states" hitting remote db per every
update between adjacent checkpoints by tracking update logs and merge them, do batch updates
only when checkpoint; lastly, we are looking for eviction policy that can identify "hot keys"
in k/v state and lazy load those "cold keys" from Cassandra.
> For now, we don't have good story regarding to data retirement. We might have to keep
forever until manually run command and clean per job related state data. Some of features
might related to incremental checkpointing feature, we hope to align with effort there also.
> Welcome comments, I will try to put a draft design doc after gathering some feedback.

This message was sent by Atlassian JIRA

View raw message