apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pramod Immaneni <pra...@datatorrent.com>
Subject Re: What is the purpose of the checkpoint control tuple
Date Fri, 13 Nov 2015 17:57:28 GMT
If checkpoint is a multiple of windows and end window tuples are already
flowing and triggering end windows on the operators is there additional
knowledge being gained by a checkpoint tuple. I can see one advantage that
you can force a checkpoint throughout the system adhoc on a window if the
STRAM decides.

Chetan can you give me an example of where the operator checkpoint at a
multiple greater than the application checkpoint would be used. I would
think something like operator wanting to set their own checkpoint interval
as an absolute unrelated to another checkpointing mechanism would be more
useful.

On Fri, Nov 13, 2015 at 9:03 AM, Amol Kekre <amol@datatorrent.com> wrote:

> There is an additional impact of using checkpoint tuple as opposed to each
> StramChild simply checkpointing at pre-known windows. This is the knowledge
> of checkpoint flow as per Chetan's #1. Stram will know that the checpoint
> tuple has passed through all upstream operators. In non-blocking
> checkpoints (default) this may not be as critical, but for blocking
> checkpoints it may be important. Plus the logic to
> re-construct/re-partition does become a lot simpler with this knowledge.
>
> Getting my memory back, after Chetan's email :) the trigger thought to move
> to checkpoint tuple was the ease of aligning checkpoints aka get a clear
> application-wide state as Chetan stated. Technically hard coding these
> numbers in each StramChild (per operator) may work, but checkpoint tuple
> made it easy and Stram could then leverage this as knowledge. Another path
> down the memory - I was pushing for heartbeat control tuple(s) whereever we
> can. These are tuples that flow through dataflow and report back some
> content from which application condiition/dataflow aspects can be derived.
> These are needed for a non-blocking master to function. A very critical
> part for operabilty we used in past attempts are distributed data-in-motion
> architecturees. Control tuple solved that purpose from checkpointing
> triggers point of view. WindowId control tuples solved that via dataflow
> point of view.
>
> Thks,
> Amol
>
>
> On Thu, Nov 12, 2015 at 9:07 PM, Chetan Narsude (cnarsude) <
> cnarsude@cisco.com> wrote:
>
> > Pramod, the previous design was to checkpoint at random window ids. The
> > issue with that was that repartitioning/recovery could be impossible in
> > certain cases if all the partitions did not checkpoint at the same
> window.
> > This is the new design with the control tuple although
> > checkpoint_window_count was added later to let the operators delay their
> > checkpoint to a later window than the time when they would normally
> > checkpoint with the control tuple. We did not want them to be able to do
> > the checkpoint earlier than scheduled one as that decision would be
> > centrally controlled via application. Useful where the operator
> attributes
> > are allowed to be configured independent of the application attributes.
> > It¹s also documented with the OperatorContext.CHECKPOINT_WINDOW_COUNT
> >
> > /**
> >      * Attribute of the operator that hints at the optimal checkpoint
> > boundary.
> >      * By default checkpointing happens after every predetermined
> > streaming windows. Application developer can override
> >      * this behavior by defining the following attribute. When this
> > attribute is defined, checkpointing will be done after
> >      * completion of later of regular checkpointing window and the window
> > whose serial number is divisible by the attribute
> >      * value. Typically user would define this value to be the same as
> > that of APPLICATION_WINDOW_COUNT so checkpointing
> >      * will be done at application window boundary.
> >      */
> >     Attribute<Integer> CHECKPOINT_WINDOW_COUNT = new
> Attribute<Integer>(1);
> >
> >
> >
> > Besides this design based on the requirement:
> > 1. Checkpointing tuple staggers the checkpoints amongst multiple stages.
> > It does not trigger checkpoint operation unless upstream operator is done
> > checkpointing. This often results in better resource utilization with
> > different resources in different configurations.
> > 2. Checkpoint tuple helps with resetting the state of the stateful stream
> > codecs.
> >
> > Tim,
> >
> > The reason for double checkpoint appears to be a bug where the
> > lastCheckpointWindowId is not set after checkpoint in the endWindow. The
> > condition in ŒCHECKPOINT:¹ case was added to avoid double checkpoints.
> Can
> > you confirm?
> >
> > ‹
> > Chetan
> >
> >
> >
> >
> > On 11/12/15, 6:07 PM, "Amol Kekre" <amol@datatorrent.com> wrote:
> >
> > >I am trying to recollect too. I do remember Chetan, Thomas, and I going
> > >deep on this choice. One issue was the efficiency of current setup. Only
> > >the inputAdapters had to insert control tuple, all other operators were
> as
> > >is. I will try to recollect other details. or maybe Chetan or Thomas can
> > >comment.
> > >
> > >Thks,
> > >Amol
> > >
> > >
> > >On Thu, Nov 12, 2015 at 5:53 PM, Pramod Immaneni <
> pramod@datatorrent.com>
> > >wrote:
> > >
> > >> From what I am seeing so far (when implementing APEX-246) it is a left
> > >>over
> > >> from an earlier implementation but I am not completely sure yet.
> > >>
> > >> On Thu, Nov 12, 2015 at 5:43 PM, Timothy Farkas <tim@datatorrent.com>
> > >> wrote:
> > >>
> > >> > After stumbling on https://malhar.atlassian.net/browse/APEX-263 I
> am
> > >> > wondering what the purpose of the CHECKPOINT control tuple is? Why
> is
> > >>it
> > >> > not sufficient to have each operator checkpoint after it's
> checkpoint
> > >> > window has passed?
> > >> >
> > >> > Thanks,
> > >> > Tim
> > >> >
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message