apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandni Singh <chan...@datatorrent.com>
Subject Re: Writing batches to database using Transactionable Store Output operator
Date Mon, 28 Dec 2015 09:16:47 GMT
Priyanka,

AbstractBatchTransactionableStore assumes all tuples in one application as
a batch because it needs to store the tuples in the store exactly-once.

If there is more than one batch in an application window, then to store the
tuples exactly once the window Id needs to be written with every tuple as
well which is not that efficient. Therefore we take advantage of the
transaction support by saving just the window id once (not with every
tuple) but this necessitates all the tuples to be considered as a batch.

Every operator in a DAG can have its own application window size. So to
reduce the size per batch, the application window attribute needs to be
modified.

Chandni

On Mon, Dec 28, 2015 at 1:01 AM, Chinmay Kolhatkar <chinmay@datatorrent.com>
wrote:

> +1 for this.
>
> ~ Chinmay.
>
> On Mon, Dec 28, 2015 at 2:27 PM, Priyanka Gugale <priyag@apache.org>
> wrote:
>
> > Hi,
> >
> > In Malhar we have an
> > operator AbstractBatchTransactionableStoreOutputOperator which creates
> > batches based on tuples received in a window. At the end of the window
> > these batches are sent to database for processing.
> > There is no way to configure MAX_SIZE on these batches. Based on input
> rate
> > the batch sizes can grow very high, and we might want to restrict batch
> > size.
> >
> > Any operator can extend and do batch management on their own, but I see
> it
> > as generic requirement and IMO we should change base class i.e.
> > AbstractBatchTransactionableStoreOutputOperator class to accept MAX_SIZE
> > for batch from outside.
> >
> > Any opinion on this?
> >
> > -Priyanka
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message