apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chanchal Singh <chanchal.apex...@gmail.com>
Subject Re: proposal to change names of processing modes
Date Wed, 03 Feb 2016 03:22:37 GMT
I do agree with Vlad. it will be good to have good explanation with example
for existing names as it will be not create confusion for those who already
knows it and also for those who are beginners.

On Wed, Feb 3, 2016 at 8:38 AM, Amol Kekre <amol@datatorrent.com> wrote:

> I agree with Vlad too.
>
> Thks
> Amol
>
>
> On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <ram@datatorrent.com>
> wrote:
>
> > I agree with Vlad: these names are so deeply embedded in the community
> that
> > changing them is likely
> > to create more problems than it solves.
> >
> > Ram
> >
> > On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <v.rozov@datatorrent.com>
> > wrote:
> >
> > > I vote to keep original names and educate/explain their meaning to non
> > > technical audience as delivery guarantee is not specific to Apex, but
> has
> > > common meaning for all streaming platforms.
> > >
> > > Vlad
> > >
> > >
> > > On 2/2/16 15:17, Timothy Farkas wrote:
> > >
> > >> Could we provide Processing and Output Centric Aliases for the
> > >> ProcessingModes?
> > >>
> > >> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
> > >> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE
> > >>
> > >> ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE
> > >> ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE
> > >> ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE
> > >>
> > >> Tim
> > >>
> > >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <
> pramod@datatorrent.com
> > >
> > >> wrote:
> > >>
> > >> Well output guarantees are managed by the operators themselves so the
> > user
> > >>> will typically not see that as part of the engine features, they only
> > see
> > >>> processing guarantees and while they are technically correct as far
> as
> > >>> individual operators are concerned the names give a different idea.
> > >>>
> > >>> Thanks
> > >>>
> > >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <tim@datatorrent.com>
> > >>> wrote:
> > >>>
> > >>> I think I understand the ambiguity you are trying to clear up Pramod.
> > >>>> Perhaps it can be disambiguated by distinguishing between Processing
> > >>>> Guarantees and Output Guarantees, when explaining to people.
> > Processing
> > >>>> Guarantees apply to the way tuples are transmitted between
> operators.
> > >>>> Output Guarantees apply to the way output operators write tuples
to
> a
> > >>>>
> > >>> Data
> > >>>
> > >>>> Sink.
> > >>>>
> > >>>> This way we can describe each term intuitively in each context:
> > >>>>
> > >>>> At Most Once: A tuple can be dropped or transmitted (written) only
> > once.
> > >>>> At Least Once: A tuple can be transmitted (written) one or more
> times.
> > >>>> Exactly Once: A tuple is transmitted (written) only once.
> > >>>>
> > >>>> Then we could provide a table with the strongest Output Guarantee
> that
> > >>>> is
> > >>>> possible for each Processing Guarantee.
> > >>>>
> > >>>> Processing          |   Strongest Output Guarantee
> > >>>> ----------------------------------------------
> > >>>> At Most Once      | At Most Once
> > >>>> At Least Once     | Exactly Once
> > >>>> Exactly Once      |  Exactly Once
> > >>>>
> > >>>> Thoughts?
> > >>>>
> > >>>> Thanks,
> > >>>> Tim
> > >>>>
> > >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <
> > sandesh@datatorrent.com>
> > >>>> wrote:
> > >>>>
> > >>>> I agree with Tim. Instead of new terminologies, better explanation
> for
> > >>>>>
> > >>>> the
> > >>>>
> > >>>>> existing once are more useful.
> > >>>>>
> > >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <
> > pramod@datatorrent.com
> > >>>>> wrote:
> > >>>>>
> > >>>>> The idea is to disambiguate without using at least once since
> exactly
> > >>>>>>
> > >>>>> once
> > >>>>>
> > >>>>>> output can still be achieved with those. Any other names
are fine,
> > >>>>>>
> > >>>>> those
> > >>>>
> > >>>>> were just suggestions.
> > >>>>>>
> > >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <
> tim@datatorrent.com
> > >
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> The new names don't make as much sense to me as the original
> names.
> > >>>>>>>
> > >>>>>> The
> > >>>>
> > >>>>> concepts require some thought to understand, and it won't
> > >>>>>>>
> > >>>>>> necessarily
> > >>>
> > >>>> be
> > >>>>>
> > >>>>>> made easier with a name change. I think a better way to
attack
> > >>>>>>> misunderstandings is to clearly explain what a window,
operator,
> > >>>>>>>
> > >>>>>> input
> > >>>>
> > >>>>> operator, output operator, tuple, checkpoint, and DAG is with
> > >>>>>>>
> > >>>>>> really
> > >>>
> > >>>> clean
> > >>>>>>
> > >>>>>>> and simple illustrations of the concepts. Then we can
explain
> more
> > >>>>>>>
> > >>>>>> involved
> > >>>>>>
> > >>>>>>> concepts like At Least Once, At Most Once, and Exactly
Once with
> > >>>>>>>
> > >>>>>> well
> > >>>
> > >>>> thought illustrations. Without a clear explanation of the basic
> > >>>>>>>
> > >>>>>> vocabulary,
> > >>>>>>
> > >>>>>>> and without pictures, it is difficult to get even technical
> people
> > >>>>>>>
> > >>>>>> to
> > >>>
> > >>>> understand these concepts.
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Tim
> > >>>>>>>
> > >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> > >>>>>>>
> > >>>>>> pramod@datatorrent.com>
> > >>>>>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>> Today we support three different processing modes for
operators,
> > >>>>>>>>
> > >>>>>>> "at
> > >>>>
> > >>>>> least
> > >>>>>>>
> > >>>>>>>> once", "at most once" and "exactly once" which
determine tuple
> > >>>>>>>>
> > >>>>>>> processing
> > >>>>>>
> > >>>>>>> and recovery behavior when there is operator recovery
from
> > >>>>>>>>
> > >>>>>>> failure.
> > >>>
> > >>>> The
> > >>>>>
> > >>>>>> default being at least once where the tuples are replayed
from
> > >>>>>>>>
> > >>>>>>> the
> > >>>
> > >>>> recovered checkpoint.
> > >>>>>>>>
> > >>>>>>>> At least once works well for most applications.
Typically
> > >>>>>>>>
> > >>>>>>> applications
> > >>>>>
> > >>>>>> persist the final output of processing through the DAG
into
> > >>>>>>>>
> > >>>>>>> various
> > >>>
> > >>>> outputs
> > >>>>>>>
> > >>>>>>>> like key value stores, databases or even HDFS files.
In many of
> > >>>>>>>>
> > >>>>>>> these
> > >>>>
> > >>>>> cases
> > >>>>>>>
> > >>>>>>>> various strategies can be employed to save the
data "exactly
> > >>>>>>>>
> > >>>>>>> once"
> > >>>
> > >>>> in
> > >>>>
> > >>>>> the
> > >>>>>>
> > >>>>>>> output, such as transactions, rewinding, meta data
storage,
> > >>>>>>>>
> > >>>>>>> idempotent
> > >>>>>
> > >>>>>> operations etc. Furthermore the exactly once processing
mode,
> > >>>>>>>>
> > >>>>>>> which
> > >>>
> > >>>> is
> > >>>>>
> > >>>>>> a
> > >>>>>>
> > >>>>>>> checkpoint performed every window is rarely used. All
this leads
> > >>>>>>>>
> > >>>>>>> to
> > >>>
> > >>>> confusion especially to somebody new and also makes it difficult
> > >>>>>>>>
> > >>>>>>> to
> > >>>
> > >>>> explain
> > >>>>>>>
> > >>>>>>>> these names to less technical audience in meetups
and public
> > >>>>>>>>
> > >>>>>>> forums.
> > >>>>
> > >>>>> What I am proposing is only a name change which will make this
> > >>>>>>>>
> > >>>>>>> more
> > >>>
> > >>>> intuitive to understand. Something simple like "repeat" for "at
> > >>>>>>>>
> > >>>>>>> least
> > >>>>
> > >>>>> once", "latest" for "at most once" and "repeat latest" for
> > >>>>>>>>
> > >>>>>>> "exactly
> > >>>
> > >>>> once"
> > >>>>>>
> > >>>>>>> can do the trick.
> > >>>>>>>>
> > >>>>>>>> Thanks
> > >>>>>>>>
> > >>>>>>>>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message