kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anna Povzner (Jira)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller
Date Thu, 09 Apr 2020 04:14:00 GMT
Anna Povzner created KAFKA-9839:
-----------------------------------

             Summary: IllegalStateException on metadata update when broker learns about its
new epoch after the controller
                 Key: KAFKA-9839
                 URL: https://issues.apache.org/jira/browse/KAFKA-9839
             Project: Kafka
          Issue Type: Bug
          Components: controller, core
    Affects Versions: 2.3.1
            Reporter: Anna Povzner


Broker throws "java.lang.IllegalStateException: Epoch XXX larger than current broker epoch
YYY"  on UPDATE_METADATA when the controller learns about the broker epoch and sends UPDATE_METADATA
before KafkaZkCLient.registerBroker completes (the broker learns about its new epoch).

Here is the scenario we observed in more detail:
1. ZK session expires on broker 1
2. Broker 1 establishes new session to ZK and creates znode
3. Controller learns about broker 1 and assigns epoch
4. Broker 1 receives UPDATE_METADATA from controller, but it does not know about its new epoch
yet, so we get an exception:

ERROR [KafkaApi-3] Error when handling request: clientId=1, correlationId=0, api=UPDATE_METADATA,
body={
.........
java.lang.IllegalStateException: Epoch XXX larger than current broker epoch YYY at kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2725)
at kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:320) at kafka.server.KafkaApis.handle(KafkaApis.scala:139)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at java.lang.Thread.run(Thread.java:748)

5. KafkaZkCLient.registerBroker completes on broker 1: "INFO Stat of the created znode at
/brokers/ids/1"

The result is the broker has a stale metadata for some time.

Possible solutions:
1. Broker returns a more specific error and controller retries UPDATE_MEDATA
2. Broker accepts UPDATE_METADATA with larger broker epoch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message