kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Kreps (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-747) RequestChannel re-design
Date Fri, 01 Feb 2013 17:40:13 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Kreps updated KAFKA-747:
----------------------------

    Component/s: network
    
> RequestChannel re-design
> ------------------------
>
>                 Key: KAFKA-747
>                 URL: https://issues.apache.org/jira/browse/KAFKA-747
>             Project: Kafka
>          Issue Type: New Feature
>          Components: network
>            Reporter: Jay Kreps
>             Fix For: 0.8.1
>
>
> We have had some discussion around how to handle queuing requests. There are two competing
concerns:
> 1. We need to maintain request order on a per-socket basis.
> 2. We want to be able to balance load flexibly over a pool of threads so that if one
thread blocks on I/O request processing continues.
> Two Approaches We Have Considered
> 1. Have a global queue of unprocessed requests. All I/O threads read requests off this
global queue and process them. To avoid re-ordering have the network layer only read one request
at a time.
> 2. Have a queue per I/O thread and have the network threads statically map sockets to
I/O thread request queues.
> Problems With These Approaches
> In the first case you are not able to get any per-producer parallelism. That is you can't
read the next request while the current one is being handled. This seems like it would not
be a big deal, but preliminary benchmarks show that it might be. 
> In the second case there are two problems. The first is that when an I/O thread gets
blocked all request processing for sockets attached to that I/O thread will grind to a halt.
If you have 10,000 connections, and  10 I/O threads, then each blockage will stop 1,000 producers.
If there is one topic that has long synchronous flush times enabled (or is experiencing fsync
locking) this will cause big latency blips for all producers using that I/O thread. The next
problem is around backpressure and memory management. Say we use BlockingQueues to feed the
I/O threads. And say that one I/O thread stalls. It's request queue will fill up and it will
then block ALL network threads, since they will block on inserting into that queue, even though
the other I/O threads are unused and have empty queues.
> A Proposed Better Solution
> The problem with the first solution is that we are not pipelining requests. The problem
with the second approach is that we are too constrained in moving work from one I/O thread
to another.
> Instead we should have a single request queue-like structure, but internally enforce
the condition that requests are not re-ordered.
> Here are the details. We retain RequestChannel but refactor its internals. Internally
we replace the blocking queue with a linked list. We also keep an in-flight-keys array with
one entry per I/O thread. When removing a work item from the list we can't just take the first
thing. Instead we need to walk the list and look for something with a request key not in the
in-flight-keys array. When a response is sent, we remove that key from the in-flight array.
> This guarantees that requests for a socket with key K are ordered, but that processing
for K can only block requests made by K.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message