flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6364) Implement incremental checkpointing in RocksDBStateBackend
Date Fri, 05 May 2017 09:34:04 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997994#comment-15997994
] 

ASF GitHub Bot commented on FLINK-6364:
---------------------------------------

Github user StefanRRichter commented on the issue:

    https://github.com/apache/flink/pull/3801
  
    I am sorry, but before merging I noticed that some tests (e.g. `RocksDBStateBackendTest.testCancelRunningSnapshot`)
fail sporadically (only on Travis). I tracked the problem and I think the cause is a lack
of eagerly closing the streams in `cancel()` to interrupt blocking IO calls.
    
    I suggest the following fix:
    
    `RocksDBIncrementalSnapshotOperation` should have it’s own `CloseableRegistry`. This
tracks all the open streams inside the checkpointing and is registered with the backends registry
for as long as the task runs. Then, in cancel, as a first step we can close and unregister
that inner `CloseableRegistry`. This also prevents races that the current stream gets closed
asynchronously by `cancel()`, which the checkpointing actually already opened the next stream
(the registry closes and blocks new streams on registration once it is closed)


> Implement incremental checkpointing in RocksDBStateBackend
> ----------------------------------------------------------
>
>                 Key: FLINK-6364
>                 URL: https://issues.apache.org/jira/browse/FLINK-6364
>             Project: Flink
>          Issue Type: Sub-task
>          Components: State Backends, Checkpointing
>            Reporter: Xiaogang Shi
>            Assignee: Xiaogang Shi
>
> {{RocksDBStateBackend}} is well suited for incremental checkpointing because RocksDB
is base on LSM trees,  which record updates in new sst files and all sst files are immutable.
By only materializing those new sst files, we can significantly improve the performance of
checkpointing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message