apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Weise (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2283) Refactor kafka output operator
Date Sat, 08 Oct 2016 03:44:20 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557076#comment-15557076

Thomas Weise commented on APEXMALHAR-2283:

The exactly-once output logic is suspect. Why is it using the same key for all messages (appId+operatorId),
why does it track extra window state in the operator and why does it rely on the hashcode
of the object. In cases where the application can provide a unique message id, it should also
be possible to use it for the key. It should be possible with the state stored in Kafka alone
to do the dedup.

The operator is also not easy to extend, we tried to implement output to topic depending on
the tuple and found ourselves stuck with some private methods and unfriendly hooks.

There is a need for redesign and good example.

> Refactor kafka output operator
> ------------------------------
>                 Key: APEXMALHAR-2283
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2283
>             Project: Apache Apex Malhar
>          Issue Type: Improvement
>            Reporter: Siyuan Hua
>            Assignee: Siyuan Hua
> The abstract kafka output operator needs to be refactored
> 1. Needs to set some mandatory properties on operator level instead of kafka property
> 2. More document and examples
> 3. Find a standard way to achieve exactly once in both 0.8 and 0.9
> More will be added when working on the ticket

This message was sent by Atlassian JIRA

View raw message