kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neha Narkhede (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-702) Deadlock between request handler/processor threads
Date Wed, 16 Jan 2013 19:14:13 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555350#comment-13555350
] 

Neha Narkhede commented on KAFKA-702:
-------------------------------------

>> 2. Having client quotas may not work because we do not have one faulty client. Each
client can at most have only one request.

We understand that. Client quotas are better done probably in terms of expirations per second.
Basically, if you setup your partitions with a large replication factor (let's say 6) and
set the num.acks in your producer to -1. At the same time, if you set your timeout too low,
all requests will timeout and expire. This will allow your client to send many requests that
all timeout.

Load shedding needs more thought. It is not as straightforward and when we scope it out, we
will need to obviously keep in mind consequences of load shedding.
                
> Deadlock between request handler/processor threads
> --------------------------------------------------
>
>                 Key: KAFKA-702
>                 URL: https://issues.apache.org/jira/browse/KAFKA-702
>             Project: Kafka
>          Issue Type: Bug
>          Components: network
>    Affects Versions: 0.8
>            Reporter: Joel Koshy
>            Assignee: Jay Kreps
>            Priority: Blocker
>              Labels: bugs
>             Fix For: 0.8
>
>         Attachments: KAFKA-702-v1.patch
>
>
> We have seen this a couple of times in the past few days in a test cluster. The request
handler and processor threads deadlock on the request/response queues bringing the server
to a halt
> "kafka-processor-10251-7" prio=10 tid=0x00007f4a0c3c9800 nid=0x4c39 waiting on condition
[0x00007f46f698e000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007f48c9dd2698> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>         at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:252)
>         at kafka.network.RequestChannel.sendRequest(RequestChannel.scala:107)
>         at kafka.network.Processor.read(SocketServer.scala:321)
>         at kafka.network.Processor.run(SocketServer.scala:231)
>         at java.lang.Thread.run(Thread.java:619)
> "kafka-request-handler-7" daemon prio=10 tid=0x00007f4a0c57f000 nid=0x4c47 waiting on
condition [0x00007f46f5b80000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007f48c9dd6348> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>         at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:252)
>         at kafka.network.RequestChannel.sendResponse(RequestChannel.scala:112)
>         at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:198)
>         at kafka.server.KafkaApis.handle(KafkaApis.scala:58)
>         at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:41)
>         at java.lang.Thread.run(Thread.java:619)
> This is because there is a cycle in the wait-for graph of processor threads and request
handler threads. If the request handling slows down on a busy server, the request queue fills
up. All processor threads quickly block on adding incoming requests to the request queue.
Due to this, those threads do not processes responses filling up their response queues. At
this moment, the request handler threads start blocking on adding responses to the respective
response queues. This can lead to a deadlock where every thread is holding a lock on one queue
and asking a lock for the other queue. This brings the server to a halt where it accepts connections
but every request gets timed out.
> One way to resolve this is by breaking the cycle in the wait-for graph of the request
handler and processor threads. Instead of having the processor threads dispatching the responses,
we can have one or more dedicated response handler threads that dequeue responses from the
queue and write those on the socket. One downside of this approach is that now access to the
selector will have to be synchronized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message