kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guozhang Wang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-3514) Stream timestamp computation needs some further thoughts
Date Fri, 23 Sep 2016 22:49:20 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Guozhang Wang updated KAFKA-3514:
---------------------------------
    Fix Version/s:     (was: 0.10.1.0)
                   0.10.2.0

> Stream timestamp computation needs some further thoughts
> --------------------------------------------------------
>
>                 Key: KAFKA-3514
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3514
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Guozhang Wang
>              Labels: architecture
>             Fix For: 0.10.2.0
>
>
> Our current stream task's timestamp is used for punctuate function as well as selecting
which stream to process next (i.e. best effort stream synchronization). And it is defined
as the smallest timestamp over all partitions in the task's partition group. This results
in two unintuitive corner cases:
> 1) observing a late arrived record would keep that stream's timestamp low for a period
of time, and hence keep being process until that late record. For example take two partitions
within the same task annotated by their timestamps:
> {code}
> Stream A: 5, 6, 7, 8, 9, 1, 10
> {code}
> {code}
> Stream B: 2, 3, 4, 5
> {code}
> The late arrived record with timestamp "1" will cause stream A to be selected continuously
in the thread loop, i.e. messages with timestamp 5, 6, 7, 8, 9 until the record itself is
dequeued and processed, then stream B will be selected starting with timestamp 2.
> 2) an empty buffered partition will cause its timestamp to be not advanced, and hence
the task timestamp as well since it is the smallest among all partitions. This may not be
a severe problem compared with 1) above though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message