hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Booth (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
Date Wed, 05 Jan 2011 00:06:50 GMT

    [ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977544#action_12977544

Jay Booth commented on HDFS-918:

Hey all, sorry for the slow response, been swamped with the new year and all.

RE: unit tests, at one point it was passing all tests, not sure if the tests changed or this
changed but I can take a look at it.

RE: 0.23, I can look at forward porting this again, but a lot of changes have gone in since

@stack, were you testing the "only pooling" patch or the "with full multiplexing" patch? 

"Only pooling" would be much simpler to forward port, although I do think that the full multiplexing
patch is pretty worthwhile.  Aside from the small-but-significant performance gain, it was
IMO much better factoring to have the DN-side logic all encapsulated in a Connection object
which has sendPacket() repeatedly called, rather than a giant procedural loop that goes down
and back up through several classes.  The architecture also made keepalive pretty straightforward..
just throw that connection back into a listening pool when done, and make corresponding changes
on client side.  But, I guess that logic's been revised now anyways, so it'd be a significant
piece of work to bring it all back up to date.

> Use single Selector and small thread pool to replace many instances of BlockSender for
> --------------------------------------------------------------------------------------------
>                 Key: HDFS-918
>                 URL: https://issues.apache.org/jira/browse/HDFS-918
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Jay Booth
>            Assignee: Jay Booth
>             Fix For: 0.22.0
>         Attachments: hbase-hdfs-benchmarks.ods, hdfs-918-20100201.patch, hdfs-918-20100203.patch,
hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, hdfs-918-branch20-append.patch,
hdfs-918-branch20.2.patch, hdfs-918-pool.patch, hdfs-918-TRUNK.patch, hdfs-multiplex.patch
> Currently, on read requests, the DataXCeiver server allocates a new thread per request,
which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage
by the sending threads.  If we had a single selector and a small threadpool to multiplex request
packets, we could theoretically achieve higher performance while taking up fewer resources
and leaving more CPU on datanodes available for mapred, hbase or whatever.  This can be done
without changing any wire protocols.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message