hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
Date Fri, 12 Mar 2010 19:06:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844620#action_12844620

Raghu Angadi commented on HDFS-918:

> RE: Netty, I'm not very knowledgeable about it beyond the Cliff's Notes version, but
my code dealing with the Selector is pretty small - the main loop is under 75 lines, and java.util.concurrent
does most of the heavy lifting

Jay, I think is ok to ignore Netty for this jira. it could be re-factored later.

>> I think it is very important to have separate pools for each partition. 
> This would be the case if I were using a fixed-size thread pool and a LinkedBlockingQueue
- but I'm not, see Executors.newCachedThreadPool(),

hmm.. does it mean that if you have thousand clients and the load is disk bound, we end up
with 1000 threads?


> Use single Selector and small thread pool to replace many instances of BlockSender for
> --------------------------------------------------------------------------------------------
>                 Key: HDFS-918
>                 URL: https://issues.apache.org/jira/browse/HDFS-918
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Jay Booth
>             Fix For: 0.22.0
>         Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-918-20100211.patch,
hdfs-918-20100228.patch, hdfs-918-20100309.patch, hdfs-multiplex.patch
> Currently, on read requests, the DataXCeiver server allocates a new thread per request,
which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage
by the sending threads.  If we had a single selector and a small threadpool to multiplex request
packets, we could theoretically achieve higher performance while taking up fewer resources
and leaving more CPU on datanodes available for mapred, hbase or whatever.  This can be done
without changing any wire protocols.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message