apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yogi Devendra <devendra.vyavah...@gmail.com>
Subject Re: proposal for reconciled JDBC output operator
Date Tue, 05 Apr 2016 07:07:17 GMT
This would great value addition for other operators.
+1 for making this operator plugin.

~ Yogi

On 4 April 2016 at 12:40, Shubham Pathak <shubham@datatorrent.com> wrote:

> +1 for having this as a platform feature.
>
> On Mon, Apr 4, 2016 at 12:27 PM, Bhupesh Chawda <bhupesh@datatorrent.com>
> wrote:
>
> > +1 on making this an Apex feature. It would be good to have all output
> > stores supported with this kind of functionality.
> >
> > ~Bhupesh
> >
> > On Thu, Mar 31, 2016 at 11:15 AM, Priyanka Gugale <
> > priyanka@datatorrent.com>
> > wrote:
> >
> > > +1 for this feature for all output operators. I have faced this issue
> > while
> > > working on CassandraOutput operator as well.
> > >
> > > -Priyanka
> > >
> > > On Wed, Mar 30, 2016 at 11:37 AM, Thomas Weise <thomas@datatorrent.com
> >
> > > wrote:
> > >
> > > > Ashwin, abstract class isn't suitable.
> > > >
> > > > The basic ingredient is WAL, from which the operator consumes as
> > > > appropriate. We already have the WAL implementation.
> > > >
> > > > Incoming tuples are added to the WAL, consumption is async. WAL
> offset
> > is
> > > > checkpointed.
> > > >
> > > > --
> > > > sent from mobile
> > > > On Mar 29, 2016 2:03 PM, "Ashwin Chandra Putta" <
> > > ashwinchandrap@gmail.com>
> > > > wrote:
> > > >
> > > > > @Chinmay I went through your code and related comments on pull
> > > request. I
> > > > > was originally thinking of using AbstractReconciler but seems like
> > > > > AbstractAsyncProcessor will be better because 1. it has
> multithreaded
> > > > > output 2. it does not wait for committed window to process queued
> > > tuples.
> > > > > One question though, is it fault tolerant?
> > > > >
> > > > > @Thomas Current design creates an abstract implementation, so it
> > needs
> > > to
> > > > > be extended to create specific output writers. Need to think more
> > about
> > > > how
> > > > > to make this pluggable.
> > > > >
> > > > > @Others I am planning to get the common functionality out to make
> it
> > > > > reusable for outputs to any external system.
> > > > >
> > > > > Regards,
> > > > > Ashwin.
> > > > >
> > > > > On Tue, Mar 29, 2016 at 11:29 AM, Thomas Weise <
> > thomas@datatorrent.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Can this be solved through a pluggable component of operator,
> > similar
> > > > to
> > > > > > window data manager?
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 29, 2016 at 11:18 AM, Siyuan Hua <
> > siyuan@datatorrent.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > +1 on making it a feature for all output operators that
depend
> on
> > > > > > external
> > > > > > > system
> > > > > > >
> > > > > > > On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh <
> > > > > > > sandeep@datatorrent.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 on making it apex feature.
> > > > > > > > On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <
> > tushar@datatorrent.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 for the idea, love to have this feature available
for
> > > > operators
> > > > > > > > writing
> > > > > > > > > to external systems.
> > > > > > > > >
> > > > > > > > > - Tushar.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar
<
> > > > > > > chinmay@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Ashwin,
> > > > > > > > > >
> > > > > > > > > > This is indeed a very required functionality.
> > > > > > > > > >
> > > > > > > > > > I think this is applicable for all three
types of
> > operators:
> > > > > Input
> > > > > > > > (e.g.
> > > > > > > > > > HDFSInput, JDBCInput, etc), Process (e.g.
Enrichment),
> > Output
> > > > > > > > > (JDBCOutput,
> > > > > > > > > > HDFSOutput, etc)
> > > > > > > > > >
> > > > > > > > > > I tried to do something similar:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
> > > > > > > > > >
> > > > > > > > > > But this had limitation related extending
another class
> > other
> > > > > than
> > > > > > > this
> > > > > > > > > > abstract class etc....
> > > > > > > > > >
> > > > > > > > > > From what you have mentioned above, I see
the
> functionality
> > > > > > basically
> > > > > > > > > > provides asynchronous processing of tuples.
> > > > > > > > > >
> > > > > > > > > > Considering this is a common functionality,
Can this
> > feature
> > > > > > instead
> > > > > > > be
> > > > > > > > > > added in platform (apex-core) itself?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Chinmay.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit
Jotwani <
> > > > > > > mohit@datatorrent.com
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Dear Ashwin,
> > > > > > > > > > >
> > > > > > > > > > > The approach sounds good. I am assuming
that this will
> be
> > > > done
> > > > > > for
> > > > > > > > all
> > > > > > > > > > the
> > > > > > > > > > > output data stores and not limited
to JDBC.
> > > > > > > > > > >
> > > > > > > > > > > +1
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Mohit
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin
Chandra Putta <
> > > > > > > > > > > ashwinchandrap@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > There are many use cases in which
we are writing
> tuples
> > > to
> > > > > > > external
> > > > > > > > > > > system
> > > > > > > > > > > > using JDBC etc. There are instances
when the external
> > > > system
> > > > > > > might
> > > > > > > > be
> > > > > > > > > > > slow
> > > > > > > > > > > > and down for some time. In those
cases, the current
> > > > > > > implementation
> > > > > > > > of
> > > > > > > > > > > jdbc
> > > > > > > > > > > > output operators fail and restart
until the external
> > > system
> > > > > is
> > > > > > up
> > > > > > > > > > again.
> > > > > > > > > > > > Meanwhile, the DAG is slowed down
by this operator.
> To
> > > deal
> > > > > > with
> > > > > > > > such
> > > > > > > > > > > > scenarios, we should write the
output in a reconciled
> > > > fashion
> > > > > > > where
> > > > > > > > > the
> > > > > > > > > > > > reconciler thread is writing at
the pace of external
> > > > system.
> > > > > We
> > > > > > > > > should
> > > > > > > > > > > also
> > > > > > > > > > > > provide an ability to spool the
data to disk when the
> > > > > external
> > > > > > > > system
> > > > > > > > > > is
> > > > > > > > > > > > down or the output operators queue
is full.
> > > > > > > > > > > >
> > > > > > > > > > > > Here are the proposed features
for the output
> operator.
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Write to external system in
a separate reconciler
> > > > thread.
> > > > > > > > > > > > 2. Queue the tuples in memory
for reconciler thread
> to
> > > > > consume.
> > > > > > > > > > > > 3. Spool the incoming tuples to
hdfs using a WAL when
> > the
> > > > > queue
> > > > > > > is
> > > > > > > > > > full.
> > > > > > > > > > > > 4. Read from WAL and write to
queue as queue is being
> > > > > consumed.
> > > > > > > > > > > > 5. When external system is able
to consume as fast as
> > > > > incoming
> > > > > > > > > > > throughput,
> > > > > > > > > > > > WAL is not written. The queue
will just buffer the
> > tuples
> > > > > > before
> > > > > > > > > > writing
> > > > > > > > > > > to
> > > > > > > > > > > > external system.
> > > > > > > > > > > >
> > > > > > > > > > > > Here is the JIRA:
> > > > > > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > > > > > > > > > > >
> > > > > > > > > > > > Please let me know if you have
any feedback on the
> > > design.
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > >
> > > > > > > > > > > > Regards,
> > > > > > > > > > > > Ashwin.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Regards,
> > > > > Ashwin.
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message