hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zlatin Balevsky (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
Date Sun, 14 Feb 2010 15:21:28 GMT

    [ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833579#action_12833579
] 

Zlatin Balevsky commented on HDFS-918:
--------------------------------------

I see a problem with doing the disk read on the same thread that is doing the select()-ing;
the round-robining of several selector threads doesn't help you avoid a situation where a
channel is writable, but the selecting thread is stuck in a transferTo call to another channel
even if there are other selector threads in handlers[] available.  With an architecture like
this you will always perform worse than a thread-per-stream approach.

Instead you could have a single selector thread that blocks only on select() and never does
any disk io (including creation of RandomAccessFile objects).  It simply dispatches the writable
channels to a threadpool that does the actual transferTo calls. 





> Use single Selector and small thread pool to replace many instances of BlockSender for
reads
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-918
>                 URL: https://issues.apache.org/jira/browse/HDFS-918
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Jay Booth
>             Fix For: 0.22.0
>
>         Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-918-20100211.patch,
hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread per request,
which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage
by the sending threads.  If we had a single selector and a small threadpool to multiplex request
packets, we could theoretically achieve higher performance while taking up fewer resources
and leaving more CPU on datanodes available for mapred, hbase or whatever.  This can be done
without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message