kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-6051) ReplicaFetcherThread should close the ReplicaFetcherBlockingSend earlier on shutdown
Date Wed, 11 Oct 2017 10:43:01 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200088#comment-16200088
] 

ASF GitHub Bot commented on KAFKA-6051:
---------------------------------------

GitHub user mayt opened a pull request:

    https://github.com/apache/kafka/pull/4056

    KAFKA-6051 Close the ReplicaFetcherBlockingSend earlier on shutdown

    Rearranged the testAddPartitionDuringDeleteTopic() test to keep the
    likelyhood of the race condition.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mayt/kafka KAFKA-6051

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/4056.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4056
    
----
commit 36c1fa6ca3bab4dc070910cba9223f4141982d82
Author: Maytee Chinavanichkit <maytee.chinavanichkit@linecorp.com>
Date:   2017-10-11T10:35:54Z

    KAFKA-6051 Close the ReplicaFetcherBlockingSend earlier on shutdown
    
    Rearranged the testAddPartitionDuringDeleteTopic() test to keep the
    likelyhood of the race condition.

----


> ReplicaFetcherThread should close the ReplicaFetcherBlockingSend earlier on shutdown
> ------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6051
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6051
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Maytee Chinavanichkit
>
> The ReplicaFetcherBlockingSend works as designed and will blocks until it is able to
get data. This becomes a problem when we are gracefully shutting down a broker. The controller
will attempt to shutdown the fetchers and elect new leaders. When the last fetch of partition
is removed, as part of the {replicaManager.becomeLeaderOrFollower} call will proceed to shut
down any idle ReplicaFetcherThread. The shutdown process here can block up to until the last
fetch request completes. This blocking delay is a big problem because the {replicaStateChangeLock},
and {mapLock} in {AbstractFetcherManager} is still locked causing latency spikes on multiple
brokers.
> At this point in time, we do not need the last response as the fetcher is shutting down.
We should close the leaderEndpoint early during {initiateShutdown()} instead of after {super.shutdown()}.
> For example we see here the shutdown blocked the broker from processing more replica
changes for ~500 ms 
> {code}
> [2017-09-01 18:11:42,879] INFO [ReplicaFetcherThread-0-2], Shutting down (kafka.server.ReplicaFetcherThread)

> [2017-09-01 18:11:43,314] INFO [ReplicaFetcherThread-0-2], Stopped (kafka.server.ReplicaFetcherThread)

> [2017-09-01 18:11:43,314] INFO [ReplicaFetcherThread-0-2], Shutdown completed (kafka.server.ReplicaFetcherThread)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message