kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apurva Mehta (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-5396) Consumer reading from beginning of log can read the same message multiple times.
Date Wed, 07 Jun 2017 06:30:19 GMT
Apurva Mehta created KAFKA-5396:
-----------------------------------

             Summary: Consumer reading from beginning of log can read the same message multiple
times.
                 Key: KAFKA-5396
                 URL: https://issues.apache.org/jira/browse/KAFKA-5396
             Project: Kafka
          Issue Type: Bug
            Reporter: Apurva Mehta


I noticed this when running the transactions system test with hard broker bounces. We have
a consumer in READ_COMMITTED mode reading from the tail of the log as the writes are appended.

This test has failed once because the concurrent consumer returned duplicate data. The actual
log has no duplicates, so the problem is in the consumer. 

One of the duplicate values is '0', and is at offset 250 in output-topic-1. The first time
it is read, we see the following.

{noformat}
[2017-06-07 05:50:34,601] TRACE Returning fetched records at offset 0 for assigned partition
output-topic-0 and update position to 250 (org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:50:34,602] TRACE Preparing to read 2967 bytes of data for partition output-topic-1
with offset 250 (org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:50:34,602] TRACE Updating high watermark for partition output-topic-1 to 502
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:50:34,613] TRACE Returning fetched records at offset 250 for assigned partition
output-topic-1 and update position to 500 (org.apache.kafka.clients.consumer.internals.Fetcher)
{noformat}

The next time it is read, we see this
{noformat}
[2017-06-07 05:51:36,386] TRACE Preparing to read 169858 bytes of data for partition output-topic-1
with offset 0 (org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:51:36,389] TRACE Updating high watermark for partition output-topic-1 to 13053
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2017-06-07 05:51:36,391] TRACE Returning fetched records at offset 0 for assigned partition
output-topic-1 and update position to 500 (org.apache.kafka.clients.consumer.internals.Fetcher)
{noformat}

For some reason, the fetcher re-sent the data from offset 0, an reset the position to 500.


This is the plain consumer doing 'poll' in a loop until it is killed. So this position reset
is puzzling. 




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message