kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Kreps (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-749) Bug in socket server shutdown logic makes the broker hang on shutdown until it has to be killed
Date Tue, 05 Feb 2013 05:56:12 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571071#comment-13571071

Jay Kreps commented on KAFKA-749:

The ugly part here is the extra layer of synchronization and signally around the already synchronized
blocking queue. This code is a bit hard to validate (for example shouldn't it be signal instead
of signalAll--since only one thing was added?) so it tends to quickly get broken by later
people who don't understand it.

I think I don't quite understand why we can't just call clear on the queue and enqueue the
AllDone object to achieve this. The uglinesses of the previous implementation where that AllDone
actually came out of the RequestChannel and that it was a ProducerRequest. This is easily
fixed. There is no reason it should be a Producer request, and the check for eq AllDone can
be done in receiveRequest.
> Bug in socket server shutdown logic makes the broker hang on shutdown until it has to
be killed
> -----------------------------------------------------------------------------------------------
>                 Key: KAFKA-749
>                 URL: https://issues.apache.org/jira/browse/KAFKA-749
>             Project: Kafka
>          Issue Type: Bug
>          Components: network
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Blocker
>              Labels: bugs, p1
>         Attachments: kafka-749-v1.patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
> The current shutdown logic of the server shuts down the io threads first, followed by
acceptor and finally processor threads. The shutdown API of io threads enqueues a special
AllDone command into the common request queue. It shuts down the io thread when it dequeues
this special all done command. What can happen is that while this shutdown command processing
is happening on the io threads, the network/processor threads can still accept new connections
and requests and will add those new requests to the request queue. That means, more requests
can be enqueued after the AllDone command. What happens is that after the io threads have
shutdown, there is no thread available to dequeue from the request queue. So the processor
threads can hang while adding new requests to a full request queue, thereby blocking the server
from shutting down.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message