apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shubham Pathak <shub...@datatorrent.com>
Subject Re: proposal to change names of processing modes
Date Wed, 03 Feb 2016 06:26:24 GMT
+1 for adding detailed explanation about the concepts in tutorials.


On Wed, Feb 3, 2016 at 11:30 AM, Chinmay Kolhatkar <chinmay@datatorrent.com>
wrote:

> +1 for Vlad's suggestion. Searching for keywords like "at least once", "at
> most once" and "exactly once" tells that these terminologies are are widely
> popular where semantics are defined for tuple processing.
> Adding example applications for each of them would help in educating the
> terminologies in Apex context.
>
> On Wed, Feb 3, 2016 at 8:52 AM, Chanchal Singh <chanchal.apexrtx@gmail.com
> >
> wrote:
>
> > I do agree with Vlad. it will be good to have good explanation with
> example
> > for existing names as it will be not create confusion for those who
> already
> > knows it and also for those who are beginners.
> >
> > On Wed, Feb 3, 2016 at 8:38 AM, Amol Kekre <amol@datatorrent.com> wrote:
> >
> > > I agree with Vlad too.
> > >
> > > Thks
> > > Amol
> > >
> > >
> > > On Tue, Feb 2, 2016 at 3:33 PM, Munagala Ramanath <ram@datatorrent.com
> >
> > > wrote:
> > >
> > > > I agree with Vlad: these names are so deeply embedded in the
> community
> > > that
> > > > changing them is likely
> > > > to create more problems than it solves.
> > > >
> > > > Ram
> > > >
> > > > On Tue, Feb 2, 2016 at 3:29 PM, Vlad Rozov <v.rozov@datatorrent.com>
> > > > wrote:
> > > >
> > > > > I vote to keep original names and educate/explain their meaning to
> > non
> > > > > technical audience as delivery guarantee is not specific to Apex,
> but
> > > has
> > > > > common meaning for all streaming platforms.
> > > > >
> > > > > Vlad
> > > > >
> > > > >
> > > > > On 2/2/16 15:17, Timothy Farkas wrote:
> > > > >
> > > > >> Could we provide Processing and Output Centric Aliases for the
> > > > >> ProcessingModes?
> > > > >>
> > > > >> ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
> > > > >> ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE
> > > > >>
> > > > >> ProcessingMode.AT_MOST_ONCE_PROCESSING =
> ProcessingMode.AT_MOST_ONCE
> > > > >> ProcessingMode.AT_LEAST_ONCE_PROCESSING =
> > ProcessingMode.AT_LEAST_ONCE
> > > > >> ProcessingMode.EXACTLY_ONCE_PROCESSING =
> ProcessingMode.EXACTLY_ONCE
> > > > >>
> > > > >> Tim
> > > > >>
> > > > >> On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <
> > > pramod@datatorrent.com
> > > > >
> > > > >> wrote:
> > > > >>
> > > > >> Well output guarantees are managed by the operators themselves
so
> > the
> > > > user
> > > > >>> will typically not see that as part of the engine features,
they
> > only
> > > > see
> > > > >>> processing guarantees and while they are technically correct
as
> far
> > > as
> > > > >>> individual operators are concerned the names give a different
> idea.
> > > > >>>
> > > > >>> Thanks
> > > > >>>
> > > > >>> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <
> > tim@datatorrent.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>> I think I understand the ambiguity you are trying to clear
up
> > Pramod.
> > > > >>>> Perhaps it can be disambiguated by distinguishing between
> > Processing
> > > > >>>> Guarantees and Output Guarantees, when explaining to
people.
> > > > Processing
> > > > >>>> Guarantees apply to the way tuples are transmitted between
> > > operators.
> > > > >>>> Output Guarantees apply to the way output operators write
tuples
> > to
> > > a
> > > > >>>>
> > > > >>> Data
> > > > >>>
> > > > >>>> Sink.
> > > > >>>>
> > > > >>>> This way we can describe each term intuitively in each
context:
> > > > >>>>
> > > > >>>> At Most Once: A tuple can be dropped or transmitted (written)
> only
> > > > once.
> > > > >>>> At Least Once: A tuple can be transmitted (written) one
or more
> > > times.
> > > > >>>> Exactly Once: A tuple is transmitted (written) only once.
> > > > >>>>
> > > > >>>> Then we could provide a table with the strongest Output
> Guarantee
> > > that
> > > > >>>> is
> > > > >>>> possible for each Processing Guarantee.
> > > > >>>>
> > > > >>>> Processing          |   Strongest Output Guarantee
> > > > >>>> ----------------------------------------------
> > > > >>>> At Most Once      | At Most Once
> > > > >>>> At Least Once     | Exactly Once
> > > > >>>> Exactly Once      |  Exactly Once
> > > > >>>>
> > > > >>>> Thoughts?
> > > > >>>>
> > > > >>>> Thanks,
> > > > >>>> Tim
> > > > >>>>
> > > > >>>> On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <
> > > > sandesh@datatorrent.com>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>> I agree with Tim. Instead of new terminologies, better
> explanation
> > > for
> > > > >>>>>
> > > > >>>> the
> > > > >>>>
> > > > >>>>> existing once are more useful.
> > > > >>>>>
> > > > >>>>> On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <
> > > > pramod@datatorrent.com
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>> The idea is to disambiguate without using at least
once since
> > > exactly
> > > > >>>>>>
> > > > >>>>> once
> > > > >>>>>
> > > > >>>>>> output can still be achieved with those. Any
other names are
> > fine,
> > > > >>>>>>
> > > > >>>>> those
> > > > >>>>
> > > > >>>>> were just suggestions.
> > > > >>>>>>
> > > > >>>>>> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas
<
> > > tim@datatorrent.com
> > > > >
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>> The new names don't make as much sense to me
as the original
> > > names.
> > > > >>>>>>>
> > > > >>>>>> The
> > > > >>>>
> > > > >>>>> concepts require some thought to understand, and
it won't
> > > > >>>>>>>
> > > > >>>>>> necessarily
> > > > >>>
> > > > >>>> be
> > > > >>>>>
> > > > >>>>>> made easier with a name change. I think a better
way to attack
> > > > >>>>>>> misunderstandings is to clearly explain what
a window,
> > operator,
> > > > >>>>>>>
> > > > >>>>>> input
> > > > >>>>
> > > > >>>>> operator, output operator, tuple, checkpoint, and
DAG is with
> > > > >>>>>>>
> > > > >>>>>> really
> > > > >>>
> > > > >>>> clean
> > > > >>>>>>
> > > > >>>>>>> and simple illustrations of the concepts.
Then we can explain
> > > more
> > > > >>>>>>>
> > > > >>>>>> involved
> > > > >>>>>>
> > > > >>>>>>> concepts like At Least Once, At Most Once,
and Exactly Once
> > with
> > > > >>>>>>>
> > > > >>>>>> well
> > > > >>>
> > > > >>>> thought illustrations. Without a clear explanation of
the basic
> > > > >>>>>>>
> > > > >>>>>> vocabulary,
> > > > >>>>>>
> > > > >>>>>>> and without pictures, it is difficult to
get even technical
> > > people
> > > > >>>>>>>
> > > > >>>>>> to
> > > > >>>
> > > > >>>> understand these concepts.
> > > > >>>>>>>
> > > > >>>>>>> Thanks,
> > > > >>>>>>> Tim
> > > > >>>>>>>
> > > > >>>>>>> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni
<
> > > > >>>>>>>
> > > > >>>>>> pramod@datatorrent.com>
> > > > >>>>>
> > > > >>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>> Today we support three different processing
modes for
> > operators,
> > > > >>>>>>>>
> > > > >>>>>>> "at
> > > > >>>>
> > > > >>>>> least
> > > > >>>>>>>
> > > > >>>>>>>> once", "at most once" and "exactly once"
which determine
> tuple
> > > > >>>>>>>>
> > > > >>>>>>> processing
> > > > >>>>>>
> > > > >>>>>>> and recovery behavior when there is operator
recovery from
> > > > >>>>>>>>
> > > > >>>>>>> failure.
> > > > >>>
> > > > >>>> The
> > > > >>>>>
> > > > >>>>>> default being at least once where the tuples
are replayed from
> > > > >>>>>>>>
> > > > >>>>>>> the
> > > > >>>
> > > > >>>> recovered checkpoint.
> > > > >>>>>>>>
> > > > >>>>>>>> At least once works well for most applications.
Typically
> > > > >>>>>>>>
> > > > >>>>>>> applications
> > > > >>>>>
> > > > >>>>>> persist the final output of processing through
the DAG into
> > > > >>>>>>>>
> > > > >>>>>>> various
> > > > >>>
> > > > >>>> outputs
> > > > >>>>>>>
> > > > >>>>>>>> like key value stores, databases or even
HDFS files. In many
> > of
> > > > >>>>>>>>
> > > > >>>>>>> these
> > > > >>>>
> > > > >>>>> cases
> > > > >>>>>>>
> > > > >>>>>>>> various strategies can be employed to
save the data "exactly
> > > > >>>>>>>>
> > > > >>>>>>> once"
> > > > >>>
> > > > >>>> in
> > > > >>>>
> > > > >>>>> the
> > > > >>>>>>
> > > > >>>>>>> output, such as transactions, rewinding,
meta data storage,
> > > > >>>>>>>>
> > > > >>>>>>> idempotent
> > > > >>>>>
> > > > >>>>>> operations etc. Furthermore the exactly once
processing mode,
> > > > >>>>>>>>
> > > > >>>>>>> which
> > > > >>>
> > > > >>>> is
> > > > >>>>>
> > > > >>>>>> a
> > > > >>>>>>
> > > > >>>>>>> checkpoint performed every window is rarely
used. All this
> > leads
> > > > >>>>>>>>
> > > > >>>>>>> to
> > > > >>>
> > > > >>>> confusion especially to somebody new and also makes it
difficult
> > > > >>>>>>>>
> > > > >>>>>>> to
> > > > >>>
> > > > >>>> explain
> > > > >>>>>>>
> > > > >>>>>>>> these names to less technical audience
in meetups and public
> > > > >>>>>>>>
> > > > >>>>>>> forums.
> > > > >>>>
> > > > >>>>> What I am proposing is only a name change which will
make this
> > > > >>>>>>>>
> > > > >>>>>>> more
> > > > >>>
> > > > >>>> intuitive to understand. Something simple like "repeat"
for "at
> > > > >>>>>>>>
> > > > >>>>>>> least
> > > > >>>>
> > > > >>>>> once", "latest" for "at most once" and "repeat latest"
for
> > > > >>>>>>>>
> > > > >>>>>>> "exactly
> > > > >>>
> > > > >>>> once"
> > > > >>>>>>
> > > > >>>>>>> can do the trick.
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message