apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: What is the purpose of the checkpoint control tuple
Date Sat, 14 Nov 2015 00:45:46 GMT
Pramod,
Doing an ad-hoc checkpoint may be a possibility.

Amol


On Fri, Nov 13, 2015 at 9:57 AM, Pramod Immaneni <pramod@datatorrent.com>
wrote:

> If checkpoint is a multiple of windows and end window tuples are already
> flowing and triggering end windows on the operators is there additional
> knowledge being gained by a checkpoint tuple. I can see one advantage that
> you can force a checkpoint throughout the system adhoc on a window if the
> STRAM decides.
>
> Chetan can you give me an example of where the operator checkpoint at a
> multiple greater than the application checkpoint would be used. I would
> think something like operator wanting to set their own checkpoint interval
> as an absolute unrelated to another checkpointing mechanism would be more
> useful.
>
> On Fri, Nov 13, 2015 at 9:03 AM, Amol Kekre <amol@datatorrent.com> wrote:
>
> > There is an additional impact of using checkpoint tuple as opposed to
> each
> > StramChild simply checkpointing at pre-known windows. This is the
> knowledge
> > of checkpoint flow as per Chetan's #1. Stram will know that the checpoint
> > tuple has passed through all upstream operators. In non-blocking
> > checkpoints (default) this may not be as critical, but for blocking
> > checkpoints it may be important. Plus the logic to
> > re-construct/re-partition does become a lot simpler with this knowledge.
> >
> > Getting my memory back, after Chetan's email :) the trigger thought to
> move
> > to checkpoint tuple was the ease of aligning checkpoints aka get a clear
> > application-wide state as Chetan stated. Technically hard coding these
> > numbers in each StramChild (per operator) may work, but checkpoint tuple
> > made it easy and Stram could then leverage this as knowledge. Another
> path
> > down the memory - I was pushing for heartbeat control tuple(s) whereever
> we
> > can. These are tuples that flow through dataflow and report back some
> > content from which application condiition/dataflow aspects can be
> derived.
> > These are needed for a non-blocking master to function. A very critical
> > part for operabilty we used in past attempts are distributed
> data-in-motion
> > architecturees. Control tuple solved that purpose from checkpointing
> > triggers point of view. WindowId control tuples solved that via dataflow
> > point of view.
> >
> > Thks,
> > Amol
> >
> >
> > On Thu, Nov 12, 2015 at 9:07 PM, Chetan Narsude (cnarsude) <
> > cnarsude@cisco.com> wrote:
> >
> > > Pramod, the previous design was to checkpoint at random window ids. The
> > > issue with that was that repartitioning/recovery could be impossible in
> > > certain cases if all the partitions did not checkpoint at the same
> > window.
> > > This is the new design with the control tuple although
> > > checkpoint_window_count was added later to let the operators delay
> their
> > > checkpoint to a later window than the time when they would normally
> > > checkpoint with the control tuple. We did not want them to be able to
> do
> > > the checkpoint earlier than scheduled one as that decision would be
> > > centrally controlled via application. Useful where the operator
> > attributes
> > > are allowed to be configured independent of the application attributes.
> > > It¹s also documented with the OperatorContext.CHECKPOINT_WINDOW_COUNT
> > >
> > > /**
> > >      * Attribute of the operator that hints at the optimal checkpoint
> > > boundary.
> > >      * By default checkpointing happens after every predetermined
> > > streaming windows. Application developer can override
> > >      * this behavior by defining the following attribute. When this
> > > attribute is defined, checkpointing will be done after
> > >      * completion of later of regular checkpointing window and the
> window
> > > whose serial number is divisible by the attribute
> > >      * value. Typically user would define this value to be the same as
> > > that of APPLICATION_WINDOW_COUNT so checkpointing
> > >      * will be done at application window boundary.
> > >      */
> > >     Attribute<Integer> CHECKPOINT_WINDOW_COUNT = new
> > Attribute<Integer>(1);
> > >
> > >
> > >
> > > Besides this design based on the requirement:
> > > 1. Checkpointing tuple staggers the checkpoints amongst multiple
> stages.
> > > It does not trigger checkpoint operation unless upstream operator is
> done
> > > checkpointing. This often results in better resource utilization with
> > > different resources in different configurations.
> > > 2. Checkpoint tuple helps with resetting the state of the stateful
> stream
> > > codecs.
> > >
> > > Tim,
> > >
> > > The reason for double checkpoint appears to be a bug where the
> > > lastCheckpointWindowId is not set after checkpoint in the endWindow.
> The
> > > condition in ŒCHECKPOINT:¹ case was added to avoid double checkpoints.
> > Can
> > > you confirm?
> > >
> > > ‹
> > > Chetan
> > >
> > >
> > >
> > >
> > > On 11/12/15, 6:07 PM, "Amol Kekre" <amol@datatorrent.com> wrote:
> > >
> > > >I am trying to recollect too. I do remember Chetan, Thomas, and I
> going
> > > >deep on this choice. One issue was the efficiency of current setup.
> Only
> > > >the inputAdapters had to insert control tuple, all other operators
> were
> > as
> > > >is. I will try to recollect other details. or maybe Chetan or Thomas
> can
> > > >comment.
> > > >
> > > >Thks,
> > > >Amol
> > > >
> > > >
> > > >On Thu, Nov 12, 2015 at 5:53 PM, Pramod Immaneni <
> > pramod@datatorrent.com>
> > > >wrote:
> > > >
> > > >> From what I am seeing so far (when implementing APEX-246) it is a
> left
> > > >>over
> > > >> from an earlier implementation but I am not completely sure yet.
> > > >>
> > > >> On Thu, Nov 12, 2015 at 5:43 PM, Timothy Farkas <
> tim@datatorrent.com>
> > > >> wrote:
> > > >>
> > > >> > After stumbling on https://malhar.atlassian.net/browse/APEX-263
I
> > am
> > > >> > wondering what the purpose of the CHECKPOINT control tuple is?
Why
> > is
> > > >>it
> > > >> > not sufficient to have each operator checkpoint after it's
> > checkpoint
> > > >> > window has passed?
> > > >> >
> > > >> > Thanks,
> > > >> > Tim
> > > >> >
> > > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message