apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhupesh Chawda <bhup...@apache.org>
Subject Re: Partitionable,Idempotent & Pollable JDBCInput
Date Wed, 13 Jul 2016 06:17:59 GMT
Thanks Dev!

I think we should also add the following points to the design / assumptions
/ JIRA. Please suggest corrections if my understanding is incorrect.

   1. This operator will create a configured number of non-polling static
   partitions for fetching the existing data in the table. And an additional
   single partition for polling additive data.
   2. The *key* column, based on which the polling will happen, is any
   column which has ever increasing values and supports greater than and less
   than operations in SQL. Even if the column is not a Primary key column or
   have a UNIQUE constraint, there should be a guarantee that there will not
   be duplicates for proper functioning of the operator.
   3. Only newly added data which has increasing ids will be fetched by the
   polling jdbc partition.
   4. There should be no updates to the existing data while the application
   is in progress ??

Thanks.

~ Bhupesh

On Wed, Jul 13, 2016 at 1:56 AM, Timothy Farkas <
timothytiborfarkas@gmail.com> wrote:

> Nice work Dev +1 for merging
>
> On Tue, Jul 12, 2016 at 1:21 PM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>
> > Finally we have a robust jdbc polling input operator. Since it is
> > @evolving, we can make improvements over time as folks start using this
> > operator.
> >
> > +1 for merging the PR.
> >
> > Regards,
> > Ashwin.
> >
> > On Tue, Jul 12, 2016 at 11:17 AM, Devendra Tagare <
> > devendrat@datatorrent.com
> > > wrote:
> >
> > > All,
> > >
> > > We have created a JDBCPollInputOperator with the below features,
> > >
> > > 1. poll from external jdbc store asynchronously in the input operator.
> > > 2. polling frequency and batch size are configurable.
> > > 3.User can specify the polling query
> > > 4.User can specify the columns to fetch as a part of the result set.
> > > 5. It is idempotent and partition-able.
> > > 6. Supports both batch + polling behavior.
> > >
> > > With the above set of features there as some assumptions for
> idempotency
> > &
> > > partitioning,
> > > 1.User needs to provide
> tableName,dbConnection,setEmitColumnList,look-up
> > > key.
> > > 2.Optionally batchSize,pollInterval,Look-up key and a where clause can
> be
> > > given.
> > > 3.This operator uses static partitioning to arrive at range queries for
> > > exactly once reads
> > > 4.Assumption is that there is an ordered column using which range
> queries
> > > can be formed.
> > > 5.If an emitColumnList is provided, please ensure that the keyColumn is
> > the
> > > first column in the list
> > > 6.Range queries are formed using the JdbcMetaDataUtility Output - comma
> > > separated list of the emit columns eg columnA,columnB,columnC
> > >
> > > Per window the first and the last key processed is saved using the
> > > FSWindowDataManager -
> (<lowerBound,UpperBound>,operatorId,windowId).This
> > > (lowerBound,upperBoundPair) is then used for recovery.The queries are
> > > constructed using the JDBCMetaDataUtility.
> > >
> > > JDBCMetaDataUtility
> > > A utility class used to retrieve the metadata for a given unique key
> of a
> > > SQL table. This class would emit range queries based on a primary index
> > > given.
> > >
> > > Presently this operator has been tested with MySQL.
> > > In the later iterations we intend to support in-clause support to
> enable
> > > exactly once semantics for non-ordered key column(s).
> > >
> > > Here's a link to the PR,
> > > https://github.com/apache/apex-malhar/pull/282
> > >
> > > Thanks,
> > > Dev
> > >
> >
> >
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message