apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: What is the purpose of the checkpoint control tuple
Date Fri, 13 Nov 2015 17:03:30 GMT
There is an additional impact of using checkpoint tuple as opposed to each
StramChild simply checkpointing at pre-known windows. This is the knowledge
of checkpoint flow as per Chetan's #1. Stram will know that the checpoint
tuple has passed through all upstream operators. In non-blocking
checkpoints (default) this may not be as critical, but for blocking
checkpoints it may be important. Plus the logic to
re-construct/re-partition does become a lot simpler with this knowledge.

Getting my memory back, after Chetan's email :) the trigger thought to move
to checkpoint tuple was the ease of aligning checkpoints aka get a clear
application-wide state as Chetan stated. Technically hard coding these
numbers in each StramChild (per operator) may work, but checkpoint tuple
made it easy and Stram could then leverage this as knowledge. Another path
down the memory - I was pushing for heartbeat control tuple(s) whereever we
can. These are tuples that flow through dataflow and report back some
content from which application condiition/dataflow aspects can be derived.
These are needed for a non-blocking master to function. A very critical
part for operabilty we used in past attempts are distributed data-in-motion
architecturees. Control tuple solved that purpose from checkpointing
triggers point of view. WindowId control tuples solved that via dataflow
point of view.

Thks,
Amol


On Thu, Nov 12, 2015 at 9:07 PM, Chetan Narsude (cnarsude) <
cnarsude@cisco.com> wrote:

> Pramod, the previous design was to checkpoint at random window ids. The
> issue with that was that repartitioning/recovery could be impossible in
> certain cases if all the partitions did not checkpoint at the same window.
> This is the new design with the control tuple although
> checkpoint_window_count was added later to let the operators delay their
> checkpoint to a later window than the time when they would normally
> checkpoint with the control tuple. We did not want them to be able to do
> the checkpoint earlier than scheduled one as that decision would be
> centrally controlled via application. Useful where the operator attributes
> are allowed to be configured independent of the application attributes.
> It¹s also documented with the OperatorContext.CHECKPOINT_WINDOW_COUNT
>
> /**
>      * Attribute of the operator that hints at the optimal checkpoint
> boundary.
>      * By default checkpointing happens after every predetermined
> streaming windows. Application developer can override
>      * this behavior by defining the following attribute. When this
> attribute is defined, checkpointing will be done after
>      * completion of later of regular checkpointing window and the window
> whose serial number is divisible by the attribute
>      * value. Typically user would define this value to be the same as
> that of APPLICATION_WINDOW_COUNT so checkpointing
>      * will be done at application window boundary.
>      */
>     Attribute<Integer> CHECKPOINT_WINDOW_COUNT = new Attribute<Integer>(1);
>
>
>
> Besides this design based on the requirement:
> 1. Checkpointing tuple staggers the checkpoints amongst multiple stages.
> It does not trigger checkpoint operation unless upstream operator is done
> checkpointing. This often results in better resource utilization with
> different resources in different configurations.
> 2. Checkpoint tuple helps with resetting the state of the stateful stream
> codecs.
>
> Tim,
>
> The reason for double checkpoint appears to be a bug where the
> lastCheckpointWindowId is not set after checkpoint in the endWindow. The
> condition in ŒCHECKPOINT:¹ case was added to avoid double checkpoints. Can
> you confirm?
>
> ‹
> Chetan
>
>
>
>
> On 11/12/15, 6:07 PM, "Amol Kekre" <amol@datatorrent.com> wrote:
>
> >I am trying to recollect too. I do remember Chetan, Thomas, and I going
> >deep on this choice. One issue was the efficiency of current setup. Only
> >the inputAdapters had to insert control tuple, all other operators were as
> >is. I will try to recollect other details. or maybe Chetan or Thomas can
> >comment.
> >
> >Thks,
> >Amol
> >
> >
> >On Thu, Nov 12, 2015 at 5:53 PM, Pramod Immaneni <pramod@datatorrent.com>
> >wrote:
> >
> >> From what I am seeing so far (when implementing APEX-246) it is a left
> >>over
> >> from an earlier implementation but I am not completely sure yet.
> >>
> >> On Thu, Nov 12, 2015 at 5:43 PM, Timothy Farkas <tim@datatorrent.com>
> >> wrote:
> >>
> >> > After stumbling on https://malhar.atlassian.net/browse/APEX-263 I am
> >> > wondering what the purpose of the CHECKPOINT control tuple is? Why is
> >>it
> >> > not sufficient to have each operator checkpoint after it's checkpoint
> >> > window has passed?
> >> >
> >> > Thanks,
> >> > Tim
> >> >
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message