flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garvit Sharma <garvit...@gmail.com>
Subject Re: Cleaning of state snapshot in state backend(HDFS)
Date Thu, 21 Jun 2018 08:09:09 GMT
Thank you for the clarification.

On Thu, Jun 21, 2018 at 1:36 PM sihua zhou <summerleafs@163.com> wrote:

> Yes, you can clear the state for a key(the currently active key), if you
> clear it, it means that you have also cleaned it from the state backend,
> and the future checpoints won't contains the key anymore unless you add it
> again.
>
> Best, Sihua
>
>
> On 06/21/2018 16:04,Garvit Sharma<garvits45@gmail.com>
> <garvits45@gmail.com> wrote:
>
> Now, after clearing state for a key, I don't want that redundant data in
> the state backend. This is my concern.
>
> Please let me know if there are any gaps.
>
> Thanks,
>
> On Thu, Jun 21, 2018 at 1:31 PM Garvit Sharma <garvits45@gmail.com> wrote:
>
>> I am maintaining state data for a key in ValueState. As per [0] I can
>> clear() state for that key.
>>
>> [0]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/state.html
>>
>> Please let me know.
>>
>> Thanks,
>>
>>
>> On Thu, Jun 21, 2018 at 1:19 PM sihua zhou <summerleafs@163.com> wrote:
>>
>>> Hi Garvit,
>>>
>>> Let's say you clearing the state at timestamp t1, then the checkpoints
>>> completed before t1 will still contains the data you cleared. But the
>>> future checkpoints won't contain the cleared data again. But I'm not sure
>>> what you meaning by the cleared state, you can only clear a key-value pair
>>> of the state currently, you can't cleared the whole state currently.
>>>
>>> Best, Sihua
>>>
>>> On 06/21/2018 15:41,Garvit Sharma<garvits45@gmail.com>
>>> <garvits45@gmail.com> wrote:
>>>
>>> So, would it delete all the files in HDFS associated with the cleared
>>> state?
>>>
>>> On Thu, Jun 21, 2018 at 12:58 PM sihua zhou <summerleafs@163.com> wrote:
>>>
>>>> Hi Garvit,
>>>>
>>>> > Now, let's say, we clear the state. Would the state data be removed
>>>> from HDFS too?
>>>>
>>>> The state data would not be removed from HDFS immediately, if you clear
>>>> the state in your job. But after you clearing the state in your job, the
>>>> later completed checkpoint won't contain the state any more.
>>>>
>>>> > How does Flink manage to clear the state data from state backend on
>>>> clearing the keyed state?
>>>>
>>>> 1. you can use the {{tate.checkpoints.num-retained}} to set the number
>>>> of the completed checkpoint maintanced on HDFS.
>>>> 2. If you set {{
>>>> env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.
>>>> DELETE_ON_CANCELLATION)}} then the checkpoints on HDFS will be removed
>>>> once your job is finished(or cancled). And if you set {{
>>>> env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.
>>>>  RETAIN_ON_CANCELLATION)}} then the checkpoints will be remained.
>>>>
>>>> Please refer to
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/state/checkpoints.html
to
>>>> find more information.
>>>>
>>>>
>>>> Additional, I'd like to give a bref info of the checkpoint on HDFS. In
>>>> a nutshell, what ever you did with the state in your running job, they only
>>>> effect the content on the state backend locally. When checkpointing, flink
>>>> takes a snapshot of the local state backend, and send it to the checkpoint
>>>> target directory(in your case, the HDFS). The checkpoints on the HDFS looks
>>>> like the periodic snapshot of the state backend of your job, they can be
>>>> created or deleted but never be changed. Maybe Stefan(cc) could give you
>>>> more professional information and plz correct me if I'm incorrect.
>>>>
>>>> Best, Sihua
>>>> On 06/21/2018 14:40,Garvit Sharma<garvits45@gmail.com>
>>>> <garvits45@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Consider a managed keyed state backed by HDFS with checkpointing
>>>> enabled. Now, as the state grows the state data will be saved on HDFS.
>>>>
>>>> Now, let's say, we clear the state. Would the state data be removed
>>>> from HDFS too?
>>>>
>>>> How does Flink manage to clear the state data from state backend on
>>>> clearing the keyed state?
>>>>
>>>> --
>>>>
>>>> Garvit Sharma
>>>> github.com/garvitlnmiit/
>>>>
>>>> No Body is a Scholar by birth, its only hard work and strong
>>>> determination that makes him master.
>>>>
>>>>
>>>
>>> --
>>>
>>> Garvit Sharma
>>> github.com/garvitlnmiit/
>>>
>>> No Body is a Scholar by birth, its only hard work and strong
>>> determination that makes him master.
>>>
>>>
>>
>> --
>>
>> Garvit Sharma
>> github.com/garvitlnmiit/
>>
>> No Body is a Scholar by birth, its only hard work and strong
>> determination that makes him master.
>>
>
>
> --
>
> Garvit Sharma
> github.com/garvitlnmiit/
>
> No Body is a Scholar by birth, its only hard work and strong determination
> that makes him master.
>
>

-- 

Garvit Sharma
github.com/garvitlnmiit/

No Body is a Scholar by birth, its only hard work and strong determination
that makes him master.

Mime
View raw message