Mailing-List: contact jira-help@kafka.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jira@kafka.apache.org
Date: Wed, 11 Oct 2017 10:45:00 +0000 (UTC)
From: "Maytee Chinavanichkit (JIRA)" <jira@apache.org>
To: jira@kafka.apache.org
Message-ID: <JIRA.13108556.1507718366000.26889.1507718700771@Atlassian.JIRA>
In-Reply-To: <JIRA.13108556.1507718366000@Atlassian.JIRA>
References: <JIRA.13108556.1507718366000@Atlassian.JIRA> <JIRA.13108556.1507718366792@jira-lw-us.apache.org>
Subject: [jira] [Issue Comment Deleted] (KAFKA-6051) ReplicaFetcherThread
 should close the ReplicaFetcherBlockingSend earlier on shutdown
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 11 Oct 2017 10:56:57 -0000


     [ https://issues.apache.org/jira/browse/KAFKA-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maytee Chinavanichkit updated KAFKA-6051:
-----------------------------------------
    Comment: was deleted

(was: https://github.com/apache/kafka/pull/4056)

> ReplicaFetcherThread should close the ReplicaFetcherBlockingSend earlier on shutdown
> ------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6051
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6051
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Maytee Chinavanichkit
>
> The ReplicaFetcherBlockingSend works as designed and will blocks until it is able to get data. This becomes a problem when we are gracefully shutting down a broker. The controller will attempt to shutdown the fetchers and elect new leaders. When the last fetch of partition is removed, as part of the {replicaManager.becomeLeaderOrFollower} call will proceed to shut down any idle ReplicaFetcherThread. The shutdown process here can block up to until the last fetch request completes. This blocking delay is a big problem because the {replicaStateChangeLock}, and {mapLock} in {AbstractFetcherManager} is still locked causing latency spikes on multiple brokers.
> At this point in time, we do not need the last response as the fetcher is shutting down. We should close the leaderEndpoint early during {initiateShutdown()} instead of after {super.shutdown()}.
> For example we see here the shutdown blocked the broker from processing more replica changes for ~500 ms 
> {code}
> [2017-09-01 18:11:42,879] INFO [ReplicaFetcherThread-0-2], Shutting down (kafka.server.ReplicaFetcherThread) 
> [2017-09-01 18:11:43,314] INFO [ReplicaFetcherThread-0-2], Stopped (kafka.server.ReplicaFetcherThread) 
> [2017-09-01 18:11:43,314] INFO [ReplicaFetcherThread-0-2], Shutdown completed (kafka.server.ReplicaFetcherThread)
> {code}


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)