kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dustin Cote (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-4207) Partitions stopped after a rapid restart of a broker
Date Thu, 22 Sep 2016 17:23:20 GMT
Dustin Cote created KAFKA-4207:

             Summary: Partitions stopped after a rapid restart of a broker
                 Key: KAFKA-4207
                 URL: https://issues.apache.org/jira/browse/KAFKA-4207
             Project: Kafka
          Issue Type: Bug
          Components: controller
    Affects Versions:,
            Reporter: Dustin Cote

4 Kafka brokers
10,000 topics with one partition each, replication factor 3
Partitions with 4KB data each
No data being produced or consumed

Initiate controlled shutdown on one broker
Interrupt controlled shutdown prior completion with a SIGKILL
Start a new broker with the same broker ID as broker that was just killed immediately

After starting the new broker, the other three brokers in the cluster will see under replicated
partitions forever for some partitions that are hosted on the broker that was killed and restarted

Today, the controller sends a StopReplica command for each replica hosted on a broker that
has initiated a controlled shutdown.  For a large number of replicas this can take awhile.
 When the broker that is doing the controlled shutdown is killed, the StopReplica commands
are queued up even though the request queue to the broker is cleared.  When the broker comes
back online, the StopReplica commands that were queued, get sent to the broker that just started

CC: [~junrao] since he's familiar with the scenario seen here

This message was sent by Atlassian JIRA

View raw message