kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajini Sivaram (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-6916) AdminClient does not refresh metadata on broker failure
Date Fri, 18 May 2018 09:36:00 GMT
Rajini Sivaram created KAFKA-6916:

             Summary: AdminClient does not refresh metadata on broker failure
                 Key: KAFKA-6916
                 URL: https://issues.apache.org/jira/browse/KAFKA-6916
             Project: Kafka
          Issue Type: Task
          Components: admin
    Affects Versions: 1.0.1, 1.1.0
            Reporter: Rajini Sivaram
            Assignee: Rajini Sivaram
             Fix For: 2.0.0

There are intermittent test failures in DynamicBrokerReconfigurationTest when brokers are
restarted. The test uses ephemeral ports and hence ports after server restart are not the
same as the ports before restart. The tests rely on metadata refresh on producers, consumers
and admin clients to obtain new server ports when connections fail. This works with producers
and consumers, but results in intermittent failures with admin client because refresh is not

There are a couple of issues in AdminClient:
 # Unlike producers and consumers, adminClient does not request metadata update when connection
to a broker fails. This is particularly bad if controller goes down. Controller is used for
various requests like createTopics and describeTopics. If controller goes down and adminClient.describeTopics()
is invoked, adminClient sends the request to the old controller. If the connection fails,
it keeps retrying with the same address. Metadata refresh is never triggered. The request
times out after 2 minutes by default, metadata is not refreshed for 5 minutes by default.
We should refresh metadata whenever connection to a broker fails.
 # Admin client requests are always retried on the same node. In the example above, if controller
goes down and a new controller is elected, it will be good if the retried request is sent
to the new controller. Otherwise we are just blocking the call for 2 minutes with a lot of
retries that would never succeed.


This message was sent by Atlassian JIRA

View raw message