apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Priyanka Gugale <priya...@datatorrent.com>
Subject Re: Writing batches to database using Transactionable Store Output operator
Date Mon, 28 Dec 2015 10:32:39 GMT
Hi,

Sorry if I was not clear, but I am trying to propose the MAX_SIZE per
window which the operator could process. The size could be less than the
MAX_SIZE, no restriction about that.

-Priyanka

On Mon, Dec 28, 2015 at 3:22 PM, Chandni Singh <chandni@datatorrent.com>
wrote:

> How do you propose to to restrict the no. of tuples processed in an
> application window < batch size.
>
> I don't see a way to enforce that batch size can never be less tuples
> processed in an application window.
>
> On Mon, Dec 28, 2015 at 1:25 AM, Priyanka Gugale <priyag@apache.org>
> wrote:
>
> > Hi Chandni,
> >
> > How about restricting tuples which can be processed per window. If
> someone
> > wants to process small and frequent batches, he can set batch size to
> some
> > small value and also reduce the window size. This would build some back
> > pressure of course. But that could be acceptable if one really want to
> > restrict batch size.
> > The though was triggered while working on Cassandra output operator.
> > Cassandra creates problem in processing batches of size greater than some
> > value (don't recall exact number right now). Other databases may want to
> > restrict the batch size for similar or other reasons.
> >
> > -Priyanka
> >
> > On Mon, Dec 28, 2015 at 2:46 PM, Chandni Singh <chandni@datatorrent.com>
> > wrote:
> >
> > > Priyanka,
> > >
> > > AbstractBatchTransactionableStore assumes all tuples in one application
> > as
> > > a batch because it needs to store the tuples in the store exactly-once.
> > >
> > > If there is more than one batch in an application window, then to store
> > the
> > > tuples exactly once the window Id needs to be written with every tuple
> as
> > > well which is not that efficient. Therefore we take advantage of the
> > > transaction support by saving just the window id once (not with every
> > > tuple) but this necessitates all the tuples to be considered as a
> batch.
> > >
> > > Every operator in a DAG can have its own application window size. So to
> > > reduce the size per batch, the application window attribute needs to be
> > > modified.
> > >
> > > Chandni
> > >
> > > On Mon, Dec 28, 2015 at 1:01 AM, Chinmay Kolhatkar <
> > > chinmay@datatorrent.com>
> > > wrote:
> > >
> > > > +1 for this.
> > > >
> > > > ~ Chinmay.
> > > >
> > > > On Mon, Dec 28, 2015 at 2:27 PM, Priyanka Gugale <priyag@apache.org>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > In Malhar we have an
> > > > > operator AbstractBatchTransactionableStoreOutputOperator which
> > creates
> > > > > batches based on tuples received in a window. At the end of the
> > window
> > > > > these batches are sent to database for processing.
> > > > > There is no way to configure MAX_SIZE on these batches. Based on
> > input
> > > > rate
> > > > > the batch sizes can grow very high, and we might want to restrict
> > batch
> > > > > size.
> > > > >
> > > > > Any operator can extend and do batch management on their own, but
I
> > see
> > > > it
> > > > > as generic requirement and IMO we should change base class i.e.
> > > > > AbstractBatchTransactionableStoreOutputOperator class to accept
> > > MAX_SIZE
> > > > > for batch from outside.
> > > > >
> > > > > Any opinion on this?
> > > > >
> > > > > -Priyanka
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message