flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sihua zhou" <summerle...@163.com>
Subject Re:Cleaning of state snapshot in state backend(HDFS)
Date Thu, 21 Jun 2018 07:28:18 GMT
Hi Garvit,


> Now, let's say, we clear the state. Would the state data be removed from HDFS too?


The state data would not be removed from HDFS immediately, if you clear the state in your
job. But after you clearing the state in your job, the later completed checkpoint won't contain
the state any more.


> How does Flink manage to clear the state data from state backend on clearing the keyed
state?


1. you can use the {{tate.checkpoints.num-retained}} to set the number of the completed checkpoint
maintanced on HDFS.
2. If you set {{env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION)}}
then the checkpoints on HDFS will be removed once your job is finished(or cancled). And if
you set {{env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.
RETAIN_ON_CANCELLATION)}} then the checkpoints will be remained.


Please refer to https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/state/checkpoints.html
to find more information.




Additional, I'd like to give a bref info of the checkpoint on HDFS. In a nutshell, what ever
you did with the state in your running job, they only effect the content on the state backend
locally. When checkpointing, flink takes a snapshot of the local state backend, and send it
to the checkpoint target directory(in your case, the HDFS). The checkpoints on the HDFS looks
like the periodic snapshot of the state backend of your job, they can be created or deleted
but never be changed. Maybe Stefan(cc) could give you more professional information and plz
correct me if I'm incorrect.


Best, Sihua
On 06/21/2018 14:40,Garvit Sharma<garvits45@gmail.com> wrote:
Hi,


Consider a managed keyed state backed by HDFS with checkpointing enabled. Now, as the state
grows the state data will be saved on HDFS.


Now, let's say, we clear the state. Would the state data be removed from HDFS too?


How does Flink manage to clear the state data from state backend on clearing the keyed state?


--


Garvit Sharma
github.com/garvitlnmiit/

No Body is a Scholar by birth, its only hard work and strong determination that makes him
master.
Mime
View raw message