apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pramod Immaneni <pra...@datatorrent.com>
Subject Re: proposal to change names of processing modes
Date Tue, 02 Feb 2016 22:22:36 GMT
The idea is to disambiguate without using at least once since exactly once
output can still be achieved with those. Any other names are fine, those
were just suggestions.

On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <tim@datatorrent.com> wrote:

> The new names don't make as much sense to me as the original names. The
> concepts require some thought to understand, and it won't necessarily be
> made easier with a name change. I think a better way to attack
> misunderstandings is to clearly explain what a window, operator, input
> operator, output operator, tuple, checkpoint, and DAG is with really clean
> and simple illustrations of the concepts. Then we can explain more involved
> concepts like At Least Once, At Most Once, and Exactly Once with well
> thought illustrations. Without a clear explanation of the basic vocabulary,
> and without pictures, it is difficult to get even technical people to
> understand these concepts.
>
> Thanks,
> Tim
>
> On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <pramod@datatorrent.com>
> wrote:
>
> > Today we support three different processing modes for operators, "at
> least
> > once", "at most once" and "exactly once" which determine tuple processing
> > and recovery behavior when there is operator recovery from failure. The
> > default being at least once where the tuples are replayed from the
> > recovered checkpoint.
> >
> > At least once works well for most applications. Typically applications
> > persist the final output of processing through the DAG into various
> outputs
> > like key value stores, databases or even HDFS files. In many of these
> cases
> > various strategies can be employed to save the data "exactly once" in the
> > output, such as transactions, rewinding, meta data storage, idempotent
> > operations etc. Furthermore the exactly once processing mode, which is a
> > checkpoint performed every window is rarely used. All this leads to
> > confusion especially to somebody new and also makes it difficult to
> explain
> > these names to less technical audience in meetups and public forums.
> >
> > What I am proposing is only a name change which will make this more
> > intuitive to understand. Something simple like "repeat" for "at least
> > once", "latest" for "at most once" and "repeat latest" for "exactly once"
> > can do the trick.
> >
> > Thanks
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message