apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (APEXMALHAR-2187) Kafka Input Operator supports retry for loading initial offset
Date Fri, 12 Aug 2016 20:44:20 GMT

     [ https://issues.apache.org/jira/browse/APEXMALHAR-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sunil updated APEXMALHAR-2187:
------------------------------
    Description: 
Kafka input operator loads initial consumer offset from Kafka/Zookeeper and if it doesn't
exist falls back to earliest or latest based on configuration. In the current implementation
the offset load method doesn't retry and in the event like timeout can reset the consumer
offset. This can result in data loss or duplicates. 

Kafka input operator should have an attribute for retrying the initial offset loading instead
of falling over to reset on first attempt. 
      

  was:
Goal : 2 Operartors for Kafka Output

      1. Simple Kafka Output Operator 
            - Supports Atleast Once 
            - Expose most used producer properties as class properties

      2. Exactly Once Kafka Output ( Not possible in all the cases, will be documented later
)
            

Design for Exactly Once

Window Data Manager - Stores the Kafka partitions offsets.
Kafka Key - Used by the operator = AppID#OperatorId

During recovery. Partially written window is re-created using the following  approach:

Tuples between the largest recovery offsets and the current offset are checked. Based on the
key, tuples written by the other entities are discarded. 

Only tuples which are not in the recovered set are emitted.

Tuples needs to be unique within the window.
      


> Kafka Input Operator supports retry for loading initial offset
> --------------------------------------------------------------
>
>                 Key: APEXMALHAR-2187
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2187
>             Project: Apache Apex Malhar
>          Issue Type: New Feature
>            Reporter: Sunil
>            Assignee: Sandesh
>
> Kafka input operator loads initial consumer offset from Kafka/Zookeeper and if it doesn't
exist falls back to earliest or latest based on configuration. In the current implementation
the offset load method doesn't retry and in the event like timeout can reset the consumer
offset. This can result in data loss or duplicates. 
> Kafka input operator should have an attribute for retrying the initial offset loading
instead of falling over to reset on first attempt. 
>       



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message