apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandni Singh <chan...@datatorrent.com>
Subject Re: Why is Async checkpointing made default?
Date Thu, 26 Nov 2015 19:03:58 GMT
That is simple to answer. The trigger to external systems can be done
Storage Agent instead by the operator.
As I mentioned, the benefit of it that it will be done asynchronously  just
after copy to hdfs is done.

Chetan can change the code because it seems he recently did this or working
on it.

Pardon me for repeating this time and again, that backward compatibility
was broken when async checkpointing was introduced. Certain behavior
existed for 3 years and then was broken. This needs to be fixed.

Chandni

On Thu, Nov 26, 2015 at 12:08 AM, Gaurav Gupta <gaurav@datatorrent.com>
wrote:

> Tim,
>
> The trigger to external systems is send by the operator in checkpointed()
> call back and not by the Storage Agent. Not sure how suggested solution
> will solve Chetan’s use case.
>
> Thanks
> - Gaurav
>
> > On Nov 25, 2015, at 11:58 PM, Timothy Farkas <tim@datatorrent.com>
> wrote:
> >
> > Gaurav,
> >
> > Chandni's method would address your point. Or you can copy the state
> > wherever you want (even asynchronously) from the checkpointed callback.
> >
> > On Wed, Nov 25, 2015 at 11:47 PM, Chandni Singh <chandni@datatorrent.com
> >
> > wrote:
> >
> >> Another approach for Chetan's use case can be to extend
> AsyncFSStorageAgent
> >> and perform the function after copy to hdfs is completed.
> >> Benefit is that the function will be performed asynchronously (with
> copy to
> >> hdfs) and will not block operator's thread.
> >>
> >> Chandni
> >>
> >> On Wed, Nov 25, 2015 at 11:33 PM, Timothy Farkas <tim@datatorrent.com>
> >> wrote:
> >>
> >>> Chetan your use case is not valid, if checkpointed is called after the
> >>> operator state is stored to local disk a similar storage function could
> >> be
> >>> performed. It is not necessary to wait for that same state to be
> >>> asynchronously moved to hdfs. Please provide an example of a valid use
> >> case
> >>> :)
> >>>
> >>> +1 for fixing this bug/regression as Thomas suggested.
> >>>
> >>> On Wed, Nov 25, 2015 at 5:35 PM, Chandni Singh <
> chandni@datatorrent.com>
> >>> wrote:
> >>>
> >>>> Semver was broken with Async checkpointing. The behavior was changed
> as
> >>>> pointed out before in the discussion. Also making it difficult for
> >>> operator
> >>>> developer doesn't give us anything.
> >>>>
> >>>> +1 for fixing it in the way Thomas suggested.
> >>>>
> >>>> On Wed, Nov 25, 2015 at 4:07 PM, Chetan Narsude (cnarsude) <
> >>>> cnarsude@cisco.com> wrote:
> >>>>
> >>>>> Yes - a few but cannot share the details - protected under NDA -
ping
> >>> me
> >>>>> in private and I can probably be able to give you more generic
> >> details
> >>> on
> >>>>> similar cooked up examples.
> >>>>>
> >>>>> The part that follows “e.g.” below is an example that probably
is
> >>>>> sufficient to infer the use case logically, I think. I shared that
to
> >>>>> exemplify how changing the semantics will break semver.
> >>>>>
> >>>>> —
> >>>>> Chetan
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 11/25/15, 3:51 PM, "Thomas Weise" <thomas@datatorrent.com>
wrote:
> >>>>>
> >>>>>> Do you have a specific example?
> >>>>>>
> >>>>>> I see this happening in committed(), but not in checkpointed()
where
> >>> the
> >>>>>> checkpoint remains intermediate, whether it was copied to HDFS
or
> >> not.
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Nov 25, 2015 at 3:42 PM, Chetan Narsude (cnarsude) <
> >>>>>> cnarsude@cisco.com> wrote:
> >>>>>>
> >>>>>>>>
> >>>>>>>> Until we have this, how about we restore the previous
behavior
> >>>>>>>> temporarily?
> >>>>>>>> Calling checkpointed() immediately does not seem to
pose any
> >>>> practical
> >>>>>>>> issue but ensures that the code that was written under
this
> >>>> assumption
> >>>>>>> is
> >>>>>>>> not broken.
> >>>>>>>
> >>>>>>> We can¹t do it. It would be incorrect. It breaks all the
other
> >> code
> >>>> that
> >>>>>>> (unassumingly) correctly complied to the semantics. e.g.
an
> >> operator
> >>>>>>> which
> >>>>>>> informs interesting parties that the checkpointed data is
> >> available
> >>>> for
> >>>>>>> immediate consumption from storage.
> >>>>>>>
> >>>>>>> ‹
> >>>>>>> Chetan
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message