apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chinmay Kolhatkar <chin...@datatorrent.com>
Subject Re: Writing batches to database using Transactionable Store Output operator
Date Tue, 29 Dec 2015 05:45:18 GMT
Hi Chandni,

I totally agree with you that the transactions should be idempotent. And
that needs to be taken care of if the batch size is configurable.

Though, I have a question related to the second part where batch size is
controlled by by controlling app window size.
I agree with you that aggregation window is a unit of aggregation provided
by platform. But, if I understand correctly, that is time based.
If I want to aggregate based on number of tuples, would this be suitable?

To give you an example, lets say I have a store on which the transaction
size should never exceed 1000 operations.
And as a streaming application, it would be difficult to guess what would
be the input rate, hence its not possible to guess how many tuples will
become part of a single application window. In such case, how can the
application window size can be used to configure transaction batch size?
Wouldn't it make more sense to have the control via exact number of tuples?

Thanks,
Chinmay.


~ Chinmay.

On Tue, Dec 29, 2015 at 12:13 AM, Chandni Singh <chandni@datatorrent.com>
wrote:

> Hey Chinmay/Priyanka,
>
> We need to write tuples exactly once in the store. Please address the
> failure scenarios on how to achieve exactly once and idempotency. I
> mentioned in my previous mail why multiple batches in a window is a problem
> with exactly once.
>
> Control via app window would mean, tuning the functionality by controlling
> the platform params. I think it's best one gets option to seperate the
> concerns of platform and that of app logic.
>
> Application window is a unit of aggregation. Every operator in a DAG can
> have different application window which is the support platform provides
> for application logic.
>
> Chandni
>
>
>
> On Mon, Dec 28, 2015 at 10:35 AM, Chinmay Kolhatkar <
> chinmay@datatorrent.com
> > wrote:
>
> > Hi,
> >
> > Just a thought on how it can possibly be done.
> >
> > The pseudo code might look like this:
> >
> > processTuple()
> > {
> > If(batchSize < configuredBatchSize){
> >    //add to the batch
> > }
> > Else {
> >   // process the batch as a transaction
> >   // empty the data structure of batch.
> > }
> > }
> >
> > endWindow()
> > {
> > // process the batch as transaction.
> > // empty the data structure of batch.
> > }
> >
> > This way, user can get better/direct control over what transaction means.
> >
> > As chandni rightly said, one can reduce the application window size for
> the
> > operator, and that would reduce the batch size. But that's not something
> > which looks intuitive from user's perspective.
> > Control via app window would mean, tuning the functionality by
> controlling
> > the platform params. I think it's best one gets option to seperate the
> > concerns of platform and that of app logic.
> >
> > If one wants to control the batch size, he/she should be able to do that
> by
> > just setting the property of batch size(a number), and not by changing
> app
> > window size (an indirect time unit).
> >
> > ~ Chinmay
> > On 28 Dec 2015 22:53, "Chandni Singh" <chandni@datatorrent.com> wrote:
> >
> > > But you will not allow multiple batches in the same window?
> > > Can you please elaborate on failure scenarios and how it affects
> > > idempotency.
> > >
> > > Chandni
> > >
> > > On Mon, Dec 28, 2015 at 2:32 AM, Priyanka Gugale <
> > priyanka@datatorrent.com
> > > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Sorry if I was not clear, but I am trying to propose the MAX_SIZE per
> > > > window which the operator could process. The size could be less than
> > the
> > > > MAX_SIZE, no restriction about that.
> > > >
> > > > -Priyanka
> > > >
> > > > On Mon, Dec 28, 2015 at 3:22 PM, Chandni Singh <
> > chandni@datatorrent.com>
> > > > wrote:
> > > >
> > > > > How do you propose to to restrict the no. of tuples processed in
an
> > > > > application window < batch size.
> > > > >
> > > > > I don't see a way to enforce that batch size can never be less
> tuples
> > > > > processed in an application window.
> > > > >
> > > > > On Mon, Dec 28, 2015 at 1:25 AM, Priyanka Gugale <
> priyag@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi Chandni,
> > > > > >
> > > > > > How about restricting tuples which can be processed per window.
> If
> > > > > someone
> > > > > > wants to process small and frequent batches, he can set batch
> size
> > to
> > > > > some
> > > > > > small value and also reduce the window size. This would build
> some
> > > back
> > > > > > pressure of course. But that could be acceptable if one really
> want
> > > to
> > > > > > restrict batch size.
> > > > > > The though was triggered while working on Cassandra output
> > operator.
> > > > > > Cassandra creates problem in processing batches of size greater
> > than
> > > > some
> > > > > > value (don't recall exact number right now). Other databases
may
> > want
> > > > to
> > > > > > restrict the batch size for similar or other reasons.
> > > > > >
> > > > > > -Priyanka
> > > > > >
> > > > > > On Mon, Dec 28, 2015 at 2:46 PM, Chandni Singh <
> > > > chandni@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Priyanka,
> > > > > > >
> > > > > > > AbstractBatchTransactionableStore assumes all tuples in
one
> > > > application
> > > > > > as
> > > > > > > a batch because it needs to store the tuples in the store
> > > > exactly-once.
> > > > > > >
> > > > > > > If there is more than one batch in an application window,
then
> to
> > > > store
> > > > > > the
> > > > > > > tuples exactly once the window Id needs to be written with
> every
> > > > tuple
> > > > > as
> > > > > > > well which is not that efficient. Therefore we take advantage
> of
> > > the
> > > > > > > transaction support by saving just the window id once (not
with
> > > every
> > > > > > > tuple) but this necessitates all the tuples to be considered
> as a
> > > > > batch.
> > > > > > >
> > > > > > > Every operator in a DAG can have its own application window
> size.
> > > So
> > > > to
> > > > > > > reduce the size per batch, the application window attribute
> needs
> > > to
> > > > be
> > > > > > > modified.
> > > > > > >
> > > > > > > Chandni
> > > > > > >
> > > > > > > On Mon, Dec 28, 2015 at 1:01 AM, Chinmay Kolhatkar <
> > > > > > > chinmay@datatorrent.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 for this.
> > > > > > > >
> > > > > > > > ~ Chinmay.
> > > > > > > >
> > > > > > > > On Mon, Dec 28, 2015 at 2:27 PM, Priyanka Gugale <
> > > > priyag@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > In Malhar we have an
> > > > > > > > > operator AbstractBatchTransactionableStoreOutputOperator
> > which
> > > > > > creates
> > > > > > > > > batches based on tuples received in a window.
At the end of
> > the
> > > > > > window
> > > > > > > > > these batches are sent to database for processing.
> > > > > > > > > There is no way to configure MAX_SIZE on these
batches.
> Based
> > > on
> > > > > > input
> > > > > > > > rate
> > > > > > > > > the batch sizes can grow very high, and we might
want to
> > > restrict
> > > > > > batch
> > > > > > > > > size.
> > > > > > > > >
> > > > > > > > > Any operator can extend and do batch management
on their
> own,
> > > > but I
> > > > > > see
> > > > > > > > it
> > > > > > > > > as generic requirement and IMO we should change
base class
> > i.e.
> > > > > > > > > AbstractBatchTransactionableStoreOutputOperator
class to
> > accept
> > > > > > > MAX_SIZE
> > > > > > > > > for batch from outside.
> > > > > > > > >
> > > > > > > > > Any opinion on this?
> > > > > > > > >
> > > > > > > > > -Priyanka
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message