hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Capriolo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
Date Fri, 31 Dec 2010 17:50:50 GMT

    [ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976304#action_12976304

Edward Capriolo commented on HDFS-918:

Fundamentally using selectors is more efficient than the one-x-per-request model. It is not
only the random-read of HBase that run into DataXCeiver issues.  For example hive supports
dynamic partitioning. A single query may output to several thousand partitions and I have
had to raise DataXCeivers and /etc/security/limits.conf to account for this. (very frustrating
to still be upping ulimits in the 21st century :)

Also with solid-state-drive technology becoming a bigger part of the datacenter, the assumption
that opening a socket per request is acceptable because the disk reads will be the bottleneck
before the number of sockets on a system may not be correct for long.

It looks to be a win for many use-cases and should not be significant to standard map-reduce
use cases.  What do we have to do to get this patch to go +1? 

> Use single Selector and small thread pool to replace many instances of BlockSender for
> --------------------------------------------------------------------------------------------
>                 Key: HDFS-918
>                 URL: https://issues.apache.org/jira/browse/HDFS-918
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Jay Booth
>            Assignee: Jay Booth
>             Fix For: 0.22.0
>         Attachments: hbase-hdfs-benchmarks.ods, hdfs-918-20100201.patch, hdfs-918-20100203.patch,
hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, hdfs-918-branch20-append.patch,
hdfs-918-branch20.2.patch, hdfs-918-pool.patch, hdfs-918-TRUNK.patch, hdfs-multiplex.patch
> Currently, on read requests, the DataXCeiver server allocates a new thread per request,
which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage
by the sending threads.  If we had a single selector and a small threadpool to multiplex request
packets, we could theoretically achieve higher performance while taking up fewer resources
and leaving more CPU on datanodes available for mapred, hbase or whatever.  This can be done
without changing any wire protocols.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message