flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sihua Zhou (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-8753) Introduce Incremental savepoint
Date Tue, 27 Feb 2018 02:24:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377908#comment-16377908

Sihua Zhou commented on FLINK-8753:

[~StephanEwen] Thanks for your reply. Indeed, what I am trying to achieve is just a faster
savepoint that does not  to iterate all records one by one (along with some condition check
that make it slow for huge data). And yes what you are described is very close to what I wanted
but I didn't use the word `checkpoint` is that: checkpoint doesn't guarantee to support rescaling
(this can be found on [flink-doc|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#difference-to-savepoints]
and the comment in this PR [5490|https://github.com/apache/flink/pull/5490]), which is always
the purpose that we trigger a savepoint. An interesting thing I found is that, in the implementation
checkpoint also support rescaling, I checked that both in code and in practice ... I wonder
whether the "archive checkpoint" guarantee to support rescaling? 

At bout the implementation, I think maybe this issue's title incorrect ... I just want to
implement the save point which go though the incremental checkpoint path but treat the `baseSstFile`
as empty ( which is look like just submit the local RocksDB snapshot on to DFS).

> Introduce Incremental savepoint
> -------------------------------
>                 Key: FLINK-8753
>                 URL: https://issues.apache.org/jira/browse/FLINK-8753
>             Project: Flink
>          Issue Type: New Feature
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Sihua Zhou
>            Assignee: Sihua Zhou
>            Priority: Major
> Right now, savepoint goes through the full checkpoint path, take a savepoint could be
slowly. In our production, for some long term job it often costs more than 10min to complete
a savepoint which is unacceptable for a real time job, so we have to turn back to use the
externalized checkpoint instead currently. But the externalized  checkpoint has a time interval
(checkpoint interval) between the last time. So I proposal to introduce the increment savepoint
which goes through the increment checkpoint path.
> Any advice would be appreciated!

This message was sent by Atlassian JIRA

View raw message