hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Booth (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
Date Sun, 28 Feb 2010 19:58:05 GMT

     [ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Booth updated HDFS-918:
---------------------------

    Attachment: hdfs-918-20100228.patch

New patch -- Took Zlatin's advice and utilized selectionKey.interestOps(0) to avoid busy waits,
so we're back to a single selector and an ExecutorService.  The ExecutorService reuses threads
if possible, destroying threads that haven't been used in 60 seconds.  Analyzed logs and the
selectorThread doesn't seem to busy wait ever.  Buffers are now stored in threadlocals and
allocated per thread (they're now HeapByteBuffers since we might have some churn and most
of our transfer is using transferTo anyways).  Still uses shared BlockChannelPool implemented
via ReadWriteLock.  

I think this will be pretty good, will benchmark tonight.

> Use single Selector and small thread pool to replace many instances of BlockSender for
reads
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-918
>                 URL: https://issues.apache.org/jira/browse/HDFS-918
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Jay Booth
>             Fix For: 0.22.0
>
>         Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-918-20100211.patch,
hdfs-918-20100228.patch, hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread per request,
which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage
by the sending threads.  If we had a single selector and a small threadpool to multiplex request
packets, we could theoretically achieve higher performance while taking up fewer resources
and leaving more CPU on datanodes available for mapred, hbase or whatever.  This can be done
without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message