hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Booth (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
Date Mon, 01 Feb 2010 05:41:53 GMT

    [ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828009#action_12828009
] 

Jay Booth commented on HDFS-918:
--------------------------------

I haven't had a chance to run benchmarks yet, but I think that under lots of connections,
the thread-per-connection model will spend more time swapping compared to getting work done,
plus it has a few places where they "hot block" by doing while (buff.hasRemaining()) { write()
}.  Only selecting the currently writeable connections and scheduling them sidesteps both
issues while being less of a resource footprint - assuming it delivers on the performance.
 As soon as I get a chance, I'll write some benchmarks.

If anyone wants to take a look at the code in the meantime, I think this patch is pretty easy
to set up  -- just enable MultiplexBlockSender.LOG for TRACE and run tests, and you can see
how each packet is built and sent.  'ant compile eclipse-files' will set up the extra dependencies
on commons-pool and commons-math.

> Use single Selector and small thread pool to replace many instances of BlockSender for
reads
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-918
>                 URL: https://issues.apache.org/jira/browse/HDFS-918
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Jay Booth
>             Fix For: 0.22.0
>
>         Attachments: hdfs-918-20100201.patch, hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread per request,
which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage
by the sending threads.  If we had a single selector and a small threadpool to multiplex request
packets, we could theoretically achieve higher performance while taking up fewer resources
and leaving more CPU on datanodes available for mapred, hbase or whatever.  This can be done
without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message