flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garvit Sharma <garvit...@gmail.com>
Subject Re: Cleaning of state snapshot in state backend(HDFS)
Date Thu, 21 Jun 2018 07:41:57 GMT
So, would it delete all the files in HDFS associated with the cleared state?

On Thu, Jun 21, 2018 at 12:58 PM sihua zhou <summerleafs@163.com> wrote:

> Hi Garvit,
>
> > Now, let's say, we clear the state. Would the state data be removed from
> HDFS too?
>
> The state data would not be removed from HDFS immediately, if you clear
> the state in your job. But after you clearing the state in your job, the
> later completed checkpoint won't contain the state any more.
>
> > How does Flink manage to clear the state data from state backend on
> clearing the keyed state?
>
> 1. you can use the {{tate.checkpoints.num-retained}} to set the number of
> the completed checkpoint maintanced on HDFS.
> 2. If you set {{
> env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.
> DELETE_ON_CANCELLATION)}} then the checkpoints on HDFS will be removed
> once your job is finished(or cancled). And if you set {{
> env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.
>  RETAIN_ON_CANCELLATION)}} then the checkpoints will be remained.
>
> Please refer to
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/state/checkpoints.html
to
> find more information.
>
>
> Additional, I'd like to give a bref info of the checkpoint on HDFS. In a
> nutshell, what ever you did with the state in your running job, they only
> effect the content on the state backend locally. When checkpointing, flink
> takes a snapshot of the local state backend, and send it to the checkpoint
> target directory(in your case, the HDFS). The checkpoints on the HDFS looks
> like the periodic snapshot of the state backend of your job, they can be
> created or deleted but never be changed. Maybe Stefan(cc) could give you
> more professional information and plz correct me if I'm incorrect.
>
> Best, Sihua
> On 06/21/2018 14:40,Garvit Sharma<garvits45@gmail.com>
> <garvits45@gmail.com> wrote:
>
> Hi,
>
> Consider a managed keyed state backed by HDFS with checkpointing enabled.
> Now, as the state grows the state data will be saved on HDFS.
>
> Now, let's say, we clear the state. Would the state data be removed from
> HDFS too?
>
> How does Flink manage to clear the state data from state backend on
> clearing the keyed state?
>
> --
>
> Garvit Sharma
> github.com/garvitlnmiit/
>
> No Body is a Scholar by birth, its only hard work and strong determination
> that makes him master.
>
>

-- 

Garvit Sharma
github.com/garvitlnmiit/

No Body is a Scholar by birth, its only hard work and strong determination
that makes him master.

Mime
View raw message