apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Farkas <...@datatorrent.com>
Subject Re: proposal to change names of processing modes
Date Tue, 02 Feb 2016 23:17:50 GMT
Could we provide Processing and Output Centric Aliases for the
ProcessingModes?

ProcessingMode.AT_MOST_ONCE_OUTPUT = ProcessingMode.AT_MOST_ONCE
ProcessingMode.EXACTLY_ONCE_OUTPUT = ProcessingMode.AT_LEAST_ONCE

ProcessingMode.AT_MOST_ONCE_PROCESSING = ProcessingMode.AT_MOST_ONCE
ProcessingMode.AT_LEAST_ONCE_PROCESSING = ProcessingMode.AT_LEAST_ONCE
ProcessingMode.EXACTLY_ONCE_PROCESSING = ProcessingMode.EXACTLY_ONCE

Tim

On Tue, Feb 2, 2016 at 3:00 PM, Pramod Immaneni <pramod@datatorrent.com>
wrote:

> Well output guarantees are managed by the operators themselves so the user
> will typically not see that as part of the engine features, they only see
> processing guarantees and while they are technically correct as far as
> individual operators are concerned the names give a different idea.
>
> Thanks
>
> On Tue, Feb 2, 2016 at 2:53 PM, Timothy Farkas <tim@datatorrent.com>
> wrote:
>
> > I think I understand the ambiguity you are trying to clear up Pramod.
> > Perhaps it can be disambiguated by distinguishing between Processing
> > Guarantees and Output Guarantees, when explaining to people. Processing
> > Guarantees apply to the way tuples are transmitted between operators.
> > Output Guarantees apply to the way output operators write tuples to a
> Data
> > Sink.
> >
> > This way we can describe each term intuitively in each context:
> >
> > At Most Once: A tuple can be dropped or transmitted (written) only once.
> > At Least Once: A tuple can be transmitted (written) one or more times.
> > Exactly Once: A tuple is transmitted (written) only once.
> >
> > Then we could provide a table with the strongest Output Guarantee that is
> > possible for each Processing Guarantee.
> >
> > Processing          |   Strongest Output Guarantee
> > ----------------------------------------------
> > At Most Once      | At Most Once
> > At Least Once     | Exactly Once
> > Exactly Once      |  Exactly Once
> >
> > Thoughts?
> >
> > Thanks,
> > Tim
> >
> > On Tue, Feb 2, 2016 at 2:25 PM, Sandesh Hegde <sandesh@datatorrent.com>
> > wrote:
> >
> > > I agree with Tim. Instead of new terminologies, better explanation for
> > the
> > > existing once are more useful.
> > >
> > > On Tue, Feb 2, 2016 at 2:23 PM Pramod Immaneni <pramod@datatorrent.com
> >
> > > wrote:
> > >
> > > > The idea is to disambiguate without using at least once since exactly
> > > once
> > > > output can still be achieved with those. Any other names are fine,
> > those
> > > > were just suggestions.
> > > >
> > > > On Tue, Feb 2, 2016 at 2:10 PM, Timothy Farkas <tim@datatorrent.com>
> > > > wrote:
> > > >
> > > > > The new names don't make as much sense to me as the original names.
> > The
> > > > > concepts require some thought to understand, and it won't
> necessarily
> > > be
> > > > > made easier with a name change. I think a better way to attack
> > > > > misunderstandings is to clearly explain what a window, operator,
> > input
> > > > > operator, output operator, tuple, checkpoint, and DAG is with
> really
> > > > clean
> > > > > and simple illustrations of the concepts. Then we can explain more
> > > > involved
> > > > > concepts like At Least Once, At Most Once, and Exactly Once with
> well
> > > > > thought illustrations. Without a clear explanation of the basic
> > > > vocabulary,
> > > > > and without pictures, it is difficult to get even technical people
> to
> > > > > understand these concepts.
> > > > >
> > > > > Thanks,
> > > > > Tim
> > > > >
> > > > > On Tue, Feb 2, 2016 at 9:13 AM, Pramod Immaneni <
> > > pramod@datatorrent.com>
> > > > > wrote:
> > > > >
> > > > > > Today we support three different processing modes for operators,
> > "at
> > > > > least
> > > > > > once", "at most once" and "exactly once" which determine tuple
> > > > processing
> > > > > > and recovery behavior when there is operator recovery from
> failure.
> > > The
> > > > > > default being at least once where the tuples are replayed from
> the
> > > > > > recovered checkpoint.
> > > > > >
> > > > > > At least once works well for most applications. Typically
> > > applications
> > > > > > persist the final output of processing through the DAG into
> various
> > > > > outputs
> > > > > > like key value stores, databases or even HDFS files. In many
of
> > these
> > > > > cases
> > > > > > various strategies can be employed to save the data "exactly
> once"
> > in
> > > > the
> > > > > > output, such as transactions, rewinding, meta data storage,
> > > idempotent
> > > > > > operations etc. Furthermore the exactly once processing mode,
> which
> > > is
> > > > a
> > > > > > checkpoint performed every window is rarely used. All this leads
> to
> > > > > > confusion especially to somebody new and also makes it difficult
> to
> > > > > explain
> > > > > > these names to less technical audience in meetups and public
> > forums.
> > > > > >
> > > > > > What I am proposing is only a name change which will make this
> more
> > > > > > intuitive to understand. Something simple like "repeat" for
"at
> > least
> > > > > > once", "latest" for "at most once" and "repeat latest" for
> "exactly
> > > > once"
> > > > > > can do the trick.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message