apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: [Proposal] Named Checkpoints
Date Thu, 04 Aug 2016 18:54:45 GMT
hmm! actually it may be a good debugging tool too. Keep the named
checkpoints around. The feature is to keep checkpoints around, which can be
done by giving a feature to not delete checkpoints, but then naming them
makes it more operational. Send a command from cli->get checkpoint -> know
it is the one you need as the file name has your string you send with the
command -> debug. This is different that querying a state as this gives
entire app checkpoint to debug with.

Thks
Amol


On Thu, Aug 4, 2016 at 11:41 AM, Venkatesh Kottapalli <
venkatesh@datatorrent.com> wrote:

> + 1 for the idea.
>
> It might be helpful to developers as well when dealing with variety of
> data in large volumes if this can help them run from the checkpointed state
> rather than rerunning the application altogether in case of issues.
>
> I have seen cases where the application runs for more than 10 hours and
> some partitions fail because of the variety of data that it is dealing
> with. In such cases, the application has to be restarted and it will be
> helpful to developers with a feature of this kind.
>
>  The ease of enabling/disabling this feature to run the app will also be
> important.
>
> -Venkatesh.
>
>
> > On Aug 4, 2016, at 10:29 AM, Amol Kekre <amol@datatorrent.com> wrote:
> >
> > We had an user who wanted roll-back and restart from audit purposes. That
> > time we did not have timed-window. Names checkpoint would have helped a
> > little bit..
> >
> > Problem statement: Auditors ask for rerun of yesterday's computations for
> > verification. Assume that these computations depend on previous state
> (i.e
> > data from day before yesterday).
> >
> > Solution
> > 1. Have named checkpoints at 12 in the night (an input adapter triggers
> it)
> > every day
> > 2. The app spools raw logs into hdfs along with window ids and event
> times
> > 3. The re-run is a separate app that starts off on a named checkpoint (12
> > night yesterday)
> >
> > Technically the solution will not as simple and "new audit app" will
> need a
> > lot of other checks (dedups, drop events not in yesterday's window, wait
> > for late arrivals, ...), but names checkpoint helps.
> >
> > I do agree with Pramod's that replay within the same running app is not
> > viable within a data-in-motion architecture. But it helps somewhat in a
> new
> > audit app. Named checkpoints help data-in-motion architectures handle
> batch
> > apps better. In the above case #2 spooling done with event time
> stamp+state
> > suffices. The state part comes from names checkpoint.
> >
> > Thks,
> > Amol
> >
> >
> >
> >
> > On Thu, Aug 4, 2016 at 10:12 AM, Sanjay Pujare <sanjay@datatorrent.com>
> > wrote:
> >
> >> I agree. A specific use-case will be useful to support this feature.
> Also
> >> the ability to replay from the named checkpoint will be limited because
> of
> >> various factors, isn’t it?
> >>
> >> On 8/4/16, 9:00 AM, "Pramod Immaneni" <pramod@datatorrent.com> wrote:
> >>
> >>    There is a problem here, keeping old checkpoints and recovering from
> >> them
> >>    means preserving the old input data along with the state. This is
> more
> >> than
> >>    the mechanism of actually creating named checkpoints, it means having
> >> the
> >>    ability for operators to move forward (a.k.a committed and dropping
> >>    committed states and buffer data) while still having the ability to
> >> replay
> >>    from that point from the input source and providing a way for
> >> operators (at
> >>    first look input operators) to distinguish that. Why would someone
> need
> >>    this with idempotent processing? Is there a specific use case you are
> >>    looking at? Suppose we go do this, for the mechanism, I would be in
> >> favor
> >>    of reusing existing tuple.
> >>
> >>    On Thu, Aug 4, 2016 at 8:44 AM, Vlad Rozov <v.rozov@datatorrent.com>
> >> wrote:
> >>
> >>> +1 for the feature. At first look I am more in favor of reusing
> >> existing
> >>> control tuple.
> >>>
> >>> Thank you,
> >>>
> >>> Vlad
> >>>
> >>>
> >>> On 8/4/16 08:17, Sandesh Hegde wrote:
> >>>
> >>>> @Chinmay
> >>>> We can enhance the existing checkpoint tuple but that one is more
> >>>> frequently used than this feature, so why burden Checkpoint tuple
> >> with
> >>>> an extra field?
> >>>>
> >>>> @Aniruddha
> >>>> It is better to leave the scheduling to the users, they can use any
> >> tool
> >>>> that they are already familiar with.
> >>>>
> >>>> On Thu, Aug 4, 2016 at 7:40 AM Aniruddha Thombare <
> >>>> aniruddha@datatorrent.com>
> >>>> wrote:
> >>>>
> >>>> +1 On the idea, it would be awesome to have.
> >>>>>
> >>>>> Question: Can we further develop this brilliant idea into:-
> >>>>> Scheduled checkpoints ( To save as  dynamically named checkpoint)?
> >>>>> This would be on the lines of logrotate / general backup
> >> strategies.
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> A
> >>>>>
> >>>>> _____________________________________
> >>>>> Sent with difficulty, I mean handheld ;)
> >>>>> On 4 Aug 2016 8:03 pm, "Munagala Ramanath" <ram@datatorrent.com>
> >> wrote:
> >>>>>
> >>>>> +1
> >>>>>>
> >>>>>> Ram
> >>>>>>
> >>>>>> On Thu, Aug 4, 2016 at 12:10 AM, Sandesh Hegde <
> >> sandesh@datatorrent.com
> >>>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hello Team,
> >>>>>>>
> >>>>>>> This thread is to discuss the Named Checkpoint feature for
Apex.
> >> (
> >>>>>>> https://issues.apache.org/jira/browse/APEXCORE-498)
> >>>>>>>
> >>>>>>> Named checkpoints allow following workflow,
> >>>>>>>
> >>>>>>> 1. Users can trigger a checkpoint and give it a name
> >>>>>>> 2. Relaunch the application from the named checkpoint.
> >>>>>>> 3. These checkpoints survive the "purge of old checkpoints".
> >>>>>>>
> >>>>>>> Current idea is to add a new control tuple,
> >> NamedCheckPointTuple, which
> >>>>>>> contains the user specified name, it traverses the DAG and
along
> >> the
> >>>>>>>
> >>>>>> way
> >>>>>
> >>>>>> necessary actions are taken.
> >>>>>>>
> >>>>>>> Please let me know your thoughts on this.
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>>
> >>>
> >>
> >>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message