apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "devendra tagare (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (APEXMALHAR-2066) Add jdbc poller input operator
Date Thu, 14 Jul 2016 18:36:20 GMT

     [ https://issues.apache.org/jira/browse/APEXMALHAR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

devendra tagare updated APEXMALHAR-2066:
----------------------------------------
    Description: 
Create a JDBC poller input operator that has the following features.

1. poll from external jdbc store asynchronously in the input operator.
2. polling frequency and batch size should be configurable.
3. should be idempotent.
4. should be partition-able.
5. should be batch + polling capable.


Assumptions for idempotency & partitioning,
1.User needs to provide tableName,dbConnection,setEmitColumnList,look-up key.
2.Optionally batchSize,pollInterval,Look-up key and a where clause can be given.
3.This operator uses static partitioning to arrive at range queries for exactly once reads.
This operator will create a configured number of non-polling static partitions for fetching
the existing data in the table. And an additional
single partition for polling additive data.
4.Assumption is that there is an ordered column using which range queries can be formed.
The *key* column, based on which the polling will happen, is any column which has ever increasing
values and supports greater than and less
than operations in SQL. 
5.If an emitColumnList is provided, please ensure that the keyColumn is the first column in
the list
6.Range queries are formed using the JdbcMetaDataUtility Output - comma separated list of
the emit columns eg columnA,columnB,columnC
7. Only newly added data which has increasing ids will be fetched by the
   polling jdbc partition

Per window the first and the last key processed is saved using the FSWindowDataManager - (<lowerBound,UpperBound>,operatorId,windowId).This
(lowerBound,upperBoundPair) is then used for recovery.The queries are constructed using the
JDBCMetaDataUtility.

JDBCMetaDataUtility
A utility class used to retrieve the metadata for a given unique key of a SQL table. This
class would emit range queries based on a primary index given.



  was:
Create a JDBC poller input operator that has the following features.

1. poll from external jdbc store asynchronously in the input operator.
2. polling frequency and batch size should be configurable.
3. should be idempotent.
4. should be partition-able.
5. should be batch + polling capable.


Assumptions for idempotency & partitioning,
1.User needs to provide tableName,dbConnection,setEmitColumnList,look-up key.
2.Optionally batchSize,pollInterval,Look-up key and a where clause can be given.
3.This operator uses static partitioning to arrive at range queries for exactly once reads
4.Assumption is that there is an ordered column using which range queries can be formed<br>
5.If an emitColumnList is provided, please ensure that the keyColumn is the first column in
the list
6.Range queries are formed using the JdbcMetaDataUtility Output - comma separated list of
the emit columns eg columnA,columnB,columnC

Per window the first and the last key processed is saved using the FSWindowDataManager - (<lowerBound,UpperBound>,operatorId,windowId).This
(lowerBound,upperBoundPair) is then used for recovery.The queries are constructed using the
JDBCMetaDataUtility.

JDBCMetaDataUtility
A utility class used to retrieve the metadata for a given unique key of a SQL table. This
class would emit range queries based on a primary index given.




> Add jdbc poller input operator
> ------------------------------
>
>                 Key: APEXMALHAR-2066
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2066
>             Project: Apache Apex Malhar
>          Issue Type: Task
>            Reporter: Ashwin Chandra Putta
>            Assignee: devendra tagare
>
> Create a JDBC poller input operator that has the following features.
> 1. poll from external jdbc store asynchronously in the input operator.
> 2. polling frequency and batch size should be configurable.
> 3. should be idempotent.
> 4. should be partition-able.
> 5. should be batch + polling capable.
> Assumptions for idempotency & partitioning,
> 1.User needs to provide tableName,dbConnection,setEmitColumnList,look-up key.
> 2.Optionally batchSize,pollInterval,Look-up key and a where clause can be given.
> 3.This operator uses static partitioning to arrive at range queries for exactly once
reads.
> This operator will create a configured number of non-polling static partitions for fetching
the existing data in the table. And an additional
> single partition for polling additive data.
> 4.Assumption is that there is an ordered column using which range queries can be formed.
> The *key* column, based on which the polling will happen, is any column which has ever
increasing values and supports greater than and less
> than operations in SQL. 
> 5.If an emitColumnList is provided, please ensure that the keyColumn is the first column
in the list
> 6.Range queries are formed using the JdbcMetaDataUtility Output - comma separated list
of the emit columns eg columnA,columnB,columnC
> 7. Only newly added data which has increasing ids will be fetched by the
>    polling jdbc partition
> Per window the first and the last key processed is saved using the FSWindowDataManager
- (<lowerBound,UpperBound>,operatorId,windowId).This (lowerBound,upperBoundPair) is
then used for recovery.The queries are constructed using the JDBCMetaDataUtility.
> JDBCMetaDataUtility
> A utility class used to retrieve the metadata for a given unique key of a SQL table.
This class would emit range queries based on a primary index given.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message