flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-4080) Kinesis consumer not exactly-once if stopped in the middle of processing aggregated records
Date Wed, 15 Jun 2016 16:47:09 GMT
Tzu-Li (Gordon) Tai created FLINK-4080:
------------------------------------------

             Summary: Kinesis consumer not exactly-once if stopped in the middle of processing
aggregated records
                 Key: FLINK-4080
                 URL: https://issues.apache.org/jira/browse/FLINK-4080
             Project: Flink
          Issue Type: Sub-task
          Components: Kinesis Connector, Streaming Connectors
            Reporter: Tzu-Li (Gordon) Tai
            Assignee: Tzu-Li (Gordon) Tai
            Priority: Critical
             Fix For: 1.1.0


I've occasionally experienced unsuccessful ManualExactlyOnceTest after several tries.

Kinesis records of the same aggregated batch will have the same sequence number, and different
sub-sequence numbers (http://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-consumer-deaggregation.html).
The current state of the consumer is committing state every time it finishes processing a
record, even de-aggregated ones. This is a bug since this will incorrectly mark all remaining
records of the de-aggregated batch as processed in the state.

Proposed fix: Use the extended `UserRecord` class in KCL to represent all records (either
non- or de-aggregated) instead of the basic `Record` class. This gives access to whether or
not the record was originally aggregated. If we encounter a de-aggregated record, don't update
state until we finished processing the last record of the batch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message