apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandesh Hegde <sand...@datatorrent.com>
Subject Re: proposal to change names of processing modes
Date Tue, 02 Feb 2016 22:25:00 GMT
I agree with Tim. Instead of new terminologies, better explanation for the
existing once are more useful.

On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <pramod@datatorrent.com>
wrote:

> The idea is to disambiguate without using at least once since exactly once
> output can still be achieved with those. Any other names are fine, those
> were just suggestions.
>
> On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <tim@datatorrent.com>
> wrote:
>
> > The new names don't make as much sense to me as the original names. The
> > concepts require some thought to understand, and it won't necessarily be
> > made easier with a name change. I think a better way to attack
> > misunderstandings is to clearly explain what a window, operator, input
> > operator, output operator, tuple, checkpoint, and DAG is with really
> clean
> > and simple illustrations of the concepts. Then we can explain more
> involved
> > concepts like At Least Once, At Most Once, and Exactly Once with well
> > thought illustrations. Without a clear explanation of the basic
> vocabulary,
> > and without pictures, it is difficult to get even technical people to
> > understand these concepts.
> >
> > Thanks,
> > Tim
> >
> > On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <pramod@datatorrent.com>
> > wrote:
> >
> > > Today we support three different processing modes for operators, "at
> > least
> > > once", "at most once" and "exactly once" which determine tuple
> processing
> > > and recovery behavior when there is operator recovery from failure. The
> > > default being at least once where the tuples are replayed from the
> > > recovered checkpoint.
> > >
> > > At least once works well for most applications. Typically applications
> > > persist the final output of processing through the DAG into various
> > outputs
> > > like key value stores, databases or even HDFS files. In many of these
> > cases
> > > various strategies can be employed to save the data "exactly once" in
> the
> > > output, such as transactions, rewinding, meta data storage, idempotent
> > > operations etc. Furthermore the exactly once processing mode, which is
> a
> > > checkpoint performed every window is rarely used. All this leads to
> > > confusion especially to somebody new and also makes it difficult to
> > explain
> > > these names to less technical audience in meetups and public forums.
> > >
> > > What I am proposing is only a name change which will make this more
> > > intuitive to understand. Something simple like "repeat" for "at least
> > > once", "latest" for "at most once" and "repeat latest" for "exactly
> once"
> > > can do the trick.
> > >
> > > Thanks
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message