kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-6189) Loosing messages on OFFSET_OUT_OF_RANGE error in consumer
Date Mon, 13 Nov 2017 09:15:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrey updated KAFKA-6189:
--------------------------
    Description: 
Steps to reproduce:
* Setup test:
** producer sends messages constantly. If cluster not available, then it will retry
** consumer polling
** topic has 3 partitions and replication factor 3. 
** min.insync.replicas=2
** producer has "acks=all"
** consumer has default "auto.offset.reset=latest"
** consumer manually commitSync offsets after handling messages.
**  unclean leader election = false
** kafka cluster has 3 brokers
* Kill broker 0
* In consumer's logs:
{code}
2017-11-08 11:36:33,967 INFO  org.apache.kafka.clients.consumer.internals.Fetcher        
  - Fetch offset 10706682 is out of range for partition mytopic-2, resetting offset [kafka-consumer]
2017-11-08 11:36:33,968 INFO  org.apache.kafka.clients.consumer.internals.Fetcher        
  - Fetch offset 8024431 is out of range for partition mytopic-1, resetting offset [kafka-consumer]
2017-11-08 11:36:34,045 INFO  org.apache.kafka.clients.consumer.internals.Fetcher        
  - Fetch offset 8029505 is out of range for partition mytopic-0, resetting offset [kafka-consumer]
{code}

After that, consumer lost several messages on each partition.

Expected:
* return upper bound of range
* consumer should resume from that offset instead of "auto.offset.reset".

Workaround:
* put "auto.offset.reset=earliest"
* get a lot of duplicate messages, instead of lost

Looks like this is what happening during the recovery from broker failure (see attachment)

  was:
Steps to reproduce:
* Setup test:
** producer sends messages constantly. If cluster not available, then it will retry
** consumer polling
** topic has 3 partitions and replication factor 3. 
** min.insync.replicas=2
** producer has "acks=all"
** consumer has default "auto.offset.reset=latest"
** consumer manually commitSync offsets after handling messages.
** kafka cluster has 3 brokers
* Kill broker 0
* In consumer's logs:
{code}
2017-11-08 11:36:33,967 INFO  org.apache.kafka.clients.consumer.internals.Fetcher        
  - Fetch offset 10706682 is out of range for partition mytopic-2, resetting offset [kafka-consumer]
2017-11-08 11:36:33,968 INFO  org.apache.kafka.clients.consumer.internals.Fetcher        
  - Fetch offset 8024431 is out of range for partition mytopic-1, resetting offset [kafka-consumer]
2017-11-08 11:36:34,045 INFO  org.apache.kafka.clients.consumer.internals.Fetcher        
  - Fetch offset 8029505 is out of range for partition mytopic-0, resetting offset [kafka-consumer]
{code}

After that, consumer lost several messages on each partition.

Expected:
* return upper bound of range
* consumer should resume from that offset instead of "auto.offset.reset".

Workaround:
* put "auto.offset.reset=earliest"
* get a lot of duplicate messages, instead of lost

Looks like this is what happening during the recovery from broker failure (see attachment)


> Loosing messages on OFFSET_OUT_OF_RANGE error in consumer
> ---------------------------------------------------------
>
>                 Key: KAFKA-6189
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6189
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 0.11.0.0
>            Reporter: Andrey
>         Attachments: kafkaLossingMessages.png
>
>
> Steps to reproduce:
> * Setup test:
> ** producer sends messages constantly. If cluster not available, then it will retry
> ** consumer polling
> ** topic has 3 partitions and replication factor 3. 
> ** min.insync.replicas=2
> ** producer has "acks=all"
> ** consumer has default "auto.offset.reset=latest"
> ** consumer manually commitSync offsets after handling messages.
> **  unclean leader election = false
> ** kafka cluster has 3 brokers
> * Kill broker 0
> * In consumer's logs:
> {code}
> 2017-11-08 11:36:33,967 INFO  org.apache.kafka.clients.consumer.internals.Fetcher   
       - Fetch offset 10706682 is out of range for partition mytopic-2, resetting offset [kafka-consumer]
> 2017-11-08 11:36:33,968 INFO  org.apache.kafka.clients.consumer.internals.Fetcher   
       - Fetch offset 8024431 is out of range for partition mytopic-1, resetting offset [kafka-consumer]
> 2017-11-08 11:36:34,045 INFO  org.apache.kafka.clients.consumer.internals.Fetcher   
       - Fetch offset 8029505 is out of range for partition mytopic-0, resetting offset [kafka-consumer]
> {code}
> After that, consumer lost several messages on each partition.
> Expected:
> * return upper bound of range
> * consumer should resume from that offset instead of "auto.offset.reset".
> Workaround:
> * put "auto.offset.reset=earliest"
> * get a lot of duplicate messages, instead of lost
> Looks like this is what happening during the recovery from broker failure (see attachment)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message