apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2066) Add jdbc poller input operator
Date Fri, 15 Jul 2016 06:05:20 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378897#comment-15378897
] 

ASF GitHub Bot commented on APEXMALHAR-2066:
--------------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/apex-malhar/pull/282


> Add jdbc poller input operator
> ------------------------------
>
>                 Key: APEXMALHAR-2066
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2066
>             Project: Apache Apex Malhar
>          Issue Type: Task
>            Reporter: Ashwin Chandra Putta
>            Assignee: devendra tagare
>
> Create a JDBC poller input operator that has the following features.
> 1. poll from external jdbc store asynchronously in the input operator.
> 2. polling frequency and batch size should be configurable.
> 3. should be idempotent.
> 4. should be partition-able.
> 5. should be batch + polling capable.
> Assumptions for idempotency & partitioning,
> 1.User needs to provide tableName,dbConnection,setEmitColumnList,look-up key.
> 2.Optionally batchSize,pollInterval,Look-up key and a where clause can be given.
> 3.This operator uses static partitioning to arrive at range queries for exactly once
reads.
> This operator will create a configured number of non-polling static partitions for fetching
the existing data in the table. And an additional
> single partition for polling additive data.
> 4.Assumption is that there is an ordered column using which range queries can be formed.
> The *key* column, based on which the polling will happen, is any column which has ever
increasing values and supports greater than and less
> than operations in SQL. 
> 5.If an emitColumnList is provided, please ensure that the keyColumn is the first column
in the list
> 6.Range queries are formed using the JdbcMetaDataUtility Output - comma separated list
of the emit columns eg columnA,columnB,columnC
> 7. Only newly added data which has increasing ids will be fetched by the
>    polling jdbc partition
> Per window the first and the last key processed is saved using the FSWindowDataManager
- (<lowerBound,UpperBound>,operatorId,windowId).This (lowerBound,upperBoundPair) is
then used for recovery.The queries are constructed using the JDBCMetaDataUtility.
> JDBCMetaDataUtility
> A utility class used to retrieve the metadata for a given unique key of a SQL table.
This class would emit range queries based on a primary index given.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message