apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: proposal to change names of processing modes
Date Wed, 03 Feb 2016 03:08:57 GMT
I agree with Vlad too.

Thks
Amol


On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <ram@datatorrent.com>
wrote:

> I agree with Vlad: these names are so deeply embedded in the community that
> changing them is likely
> to create more problems than it solves.
>
> Ram
>
> On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <v.rozov@datatorrent.com>
> wrote:
>
> > I vote to keep original names and educate/explain their meaning to non
> > technical audience as delivery guarantee is not specific to Apex, but has
> > common meaning for all streaming platforms.
> >
> > Vlad
> >
> >
> > On 2/2/16 15:17, Timothy Farkas wrote:
> >
> >> Could we provide Processing and Output Centric Aliases for the
> >> ProcessingModes?
> >>
> >> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
> >> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE
> >>
> >> ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE
> >> ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE
> >> ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE
> >>
> >> Tim
> >>
> >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <pramod@datatorrent.com
> >
> >> wrote:
> >>
> >> Well output guarantees are managed by the operators themselves so the
> user
> >>> will typically not see that as part of the engine features, they only
> see
> >>> processing guarantees and while they are technically correct as far as
> >>> individual operators are concerned the names give a different idea.
> >>>
> >>> Thanks
> >>>
> >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <tim@datatorrent.com>
> >>> wrote:
> >>>
> >>> I think I understand the ambiguity you are trying to clear up Pramod.
> >>>> Perhaps it can be disambiguated by distinguishing between Processing
> >>>> Guarantees and Output Guarantees, when explaining to people.
> Processing
> >>>> Guarantees apply to the way tuples are transmitted between operators.
> >>>> Output Guarantees apply to the way output operators write tuples to
a
> >>>>
> >>> Data
> >>>
> >>>> Sink.
> >>>>
> >>>> This way we can describe each term intuitively in each context:
> >>>>
> >>>> At Most Once: A tuple can be dropped or transmitted (written) only
> once.
> >>>> At Least Once: A tuple can be transmitted (written) one or more times.
> >>>> Exactly Once: A tuple is transmitted (written) only once.
> >>>>
> >>>> Then we could provide a table with the strongest Output Guarantee that
> >>>> is
> >>>> possible for each Processing Guarantee.
> >>>>
> >>>> Processing          |   Strongest Output Guarantee
> >>>> ----------------------------------------------
> >>>> At Most Once      | At Most Once
> >>>> At Least Once     | Exactly Once
> >>>> Exactly Once      |  Exactly Once
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Thanks,
> >>>> Tim
> >>>>
> >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <
> sandesh@datatorrent.com>
> >>>> wrote:
> >>>>
> >>>> I agree with Tim. Instead of new terminologies, better explanation for
> >>>>>
> >>>> the
> >>>>
> >>>>> existing once are more useful.
> >>>>>
> >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <
> pramod@datatorrent.com
> >>>>> wrote:
> >>>>>
> >>>>> The idea is to disambiguate without using at least once since exactly
> >>>>>>
> >>>>> once
> >>>>>
> >>>>>> output can still be achieved with those. Any other names are
fine,
> >>>>>>
> >>>>> those
> >>>>
> >>>>> were just suggestions.
> >>>>>>
> >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <tim@datatorrent.com
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>> The new names don't make as much sense to me as the original
names.
> >>>>>>>
> >>>>>> The
> >>>>
> >>>>> concepts require some thought to understand, and it won't
> >>>>>>>
> >>>>>> necessarily
> >>>
> >>>> be
> >>>>>
> >>>>>> made easier with a name change. I think a better way to attack
> >>>>>>> misunderstandings is to clearly explain what a window, operator,
> >>>>>>>
> >>>>>> input
> >>>>
> >>>>> operator, output operator, tuple, checkpoint, and DAG is with
> >>>>>>>
> >>>>>> really
> >>>
> >>>> clean
> >>>>>>
> >>>>>>> and simple illustrations of the concepts. Then we can explain
more
> >>>>>>>
> >>>>>> involved
> >>>>>>
> >>>>>>> concepts like At Least Once, At Most Once, and Exactly Once
with
> >>>>>>>
> >>>>>> well
> >>>
> >>>> thought illustrations. Without a clear explanation of the basic
> >>>>>>>
> >>>>>> vocabulary,
> >>>>>>
> >>>>>>> and without pictures, it is difficult to get even technical
people
> >>>>>>>
> >>>>>> to
> >>>
> >>>> understand these concepts.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Tim
> >>>>>>>
> >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> >>>>>>>
> >>>>>> pramod@datatorrent.com>
> >>>>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Today we support three different processing modes for operators,
> >>>>>>>>
> >>>>>>> "at
> >>>>
> >>>>> least
> >>>>>>>
> >>>>>>>> once", "at most once" and "exactly once" which determine
tuple
> >>>>>>>>
> >>>>>>> processing
> >>>>>>
> >>>>>>> and recovery behavior when there is operator recovery from
> >>>>>>>>
> >>>>>>> failure.
> >>>
> >>>> The
> >>>>>
> >>>>>> default being at least once where the tuples are replayed from
> >>>>>>>>
> >>>>>>> the
> >>>
> >>>> recovered checkpoint.
> >>>>>>>>
> >>>>>>>> At least once works well for most applications. Typically
> >>>>>>>>
> >>>>>>> applications
> >>>>>
> >>>>>> persist the final output of processing through the DAG into
> >>>>>>>>
> >>>>>>> various
> >>>
> >>>> outputs
> >>>>>>>
> >>>>>>>> like key value stores, databases or even HDFS files.
In many of
> >>>>>>>>
> >>>>>>> these
> >>>>
> >>>>> cases
> >>>>>>>
> >>>>>>>> various strategies can be employed to save the data
"exactly
> >>>>>>>>
> >>>>>>> once"
> >>>
> >>>> in
> >>>>
> >>>>> the
> >>>>>>
> >>>>>>> output, such as transactions, rewinding, meta data storage,
> >>>>>>>>
> >>>>>>> idempotent
> >>>>>
> >>>>>> operations etc. Furthermore the exactly once processing mode,
> >>>>>>>>
> >>>>>>> which
> >>>
> >>>> is
> >>>>>
> >>>>>> a
> >>>>>>
> >>>>>>> checkpoint performed every window is rarely used. All this
leads
> >>>>>>>>
> >>>>>>> to
> >>>
> >>>> confusion especially to somebody new and also makes it difficult
> >>>>>>>>
> >>>>>>> to
> >>>
> >>>> explain
> >>>>>>>
> >>>>>>>> these names to less technical audience in meetups and
public
> >>>>>>>>
> >>>>>>> forums.
> >>>>
> >>>>> What I am proposing is only a name change which will make this
> >>>>>>>>
> >>>>>>> more
> >>>
> >>>> intuitive to understand. Something simple like "repeat" for "at
> >>>>>>>>
> >>>>>>> least
> >>>>
> >>>>> once", "latest" for "at most once" and "repeat latest" for
> >>>>>>>>
> >>>>>>> "exactly
> >>>
> >>>> once"
> >>>>>>
> >>>>>>> can do the trick.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message