flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaogang Shi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-5086) Clean dead snapshot files produced by the tasks failing to acknowledge checkpoints
Date Thu, 17 Nov 2016 10:01:58 GMT
Xiaogang Shi created FLINK-5086:
-----------------------------------

             Summary: Clean dead snapshot files produced by the tasks failing to acknowledge
checkpoints
                 Key: FLINK-5086
                 URL: https://issues.apache.org/jira/browse/FLINK-5086
             Project: Flink
          Issue Type: Bug
          Components: State Backends, Checkpointing
            Reporter: Xiaogang Shi


A task may fail when performing checkpoints. In that case, the task may have already copied
some data to external storage. But since the task fails to send the state handler to {{CheckpointCoordinator}},
the copied data will not be deleted by {{CheckpointCoordinator}}. 

I think we must find a method to clean such dead snapshot data to avoid unlimited usage of
external storage. 

One possible method is to clean these dead files when the task recovers. When a task recovers,
{{CheckpointCoordinator}} will tell the task all the retained checkpoints. The task then can
scan the external storage to delete all the  snapshots not in these retained checkpoints.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message