apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tushar Gosavi <tus...@datatorrent.com>
Subject Re: proposal for reconciled JDBC output operator
Date Tue, 29 Mar 2016 06:52:51 GMT
+1 for the idea, love to have this feature available for operators writing
to external systems.

- Tushar.


On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <chinmay@apache.org>
wrote:

> Hi Ashwin,
>
> This is indeed a very required functionality.
>
> I think this is applicable for all three types of operators: Input (e.g.
> HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output (JDBCOutput,
> HDFSOutput, etc)
>
> I tried to do something similar:
>
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
>
> But this had limitation related extending another class other than this
> abstract class etc....
>
> From what you have mentioned above, I see the functionality basically
> provides asynchronous processing of tuples.
>
> Considering this is a common functionality, Can this feature instead be
> added in platform (apex-core) itself?
>
> Thanks,
> Chinmay.
>
>
>
> On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <mohit@datatorrent.com>
> wrote:
>
> > Dear Ashwin,
> >
> > The approach sounds good. I am assuming that this will be done for all
> the
> > output data stores and not limited to JDBC.
> >
> > +1
> >
> > Regards,
> > Mohit
> >
> > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> > ashwinchandrap@gmail.com> wrote:
> >
> > > There are many use cases in which we are writing tuples to external
> > system
> > > using JDBC etc. There are instances when the external system might be
> > slow
> > > and down for some time. In those cases, the current implementation of
> > jdbc
> > > output operators fail and restart until the external system is up
> again.
> > > Meanwhile, the DAG is slowed down by this operator. To deal with such
> > > scenarios, we should write the output in a reconciled fashion where the
> > > reconciler thread is writing at the pace of external system. We should
> > also
> > > provide an ability to spool the data to disk when the external system
> is
> > > down or the output operators queue is full.
> > >
> > > Here are the proposed features for the output operator.
> > >
> > > 1. Write to external system in a separate reconciler thread.
> > > 2. Queue the tuples in memory for reconciler thread to consume.
> > > 3. Spool the incoming tuples to hdfs using a WAL when the queue is
> full.
> > > 4. Read from WAL and write to queue as queue is being consumed.
> > > 5. When external system is able to consume as fast as incoming
> > throughput,
> > > WAL is not written. The queue will just buffer the tuples before
> writing
> > to
> > > external system.
> > >
> > > Here is the JIRA:
> https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > >
> > > Please let me know if you have any feedback on the design.
> > >
> > > --
> > >
> > > Regards,
> > > Ashwin.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message