hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Booth (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
Date Mon, 01 Feb 2010 05:19:51 GMT

     [ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jay Booth updated HDFS-918:

    Attachment: hdfs-918-20100201.patch

New patch.. 

* new configuration params:  dfs.datanode.multiplexBlockSender=true, dfs.datanode.multiplex.packetSize=32k,

* Packet size is tuneable, possibly allowing better performance with larger TCP buffers enabled

* Workers only wake up when a connection is writable

* 3 new class files, minor changes to DataXceiverServer and DataXceiver, 2 utility classes
added to DataTransferProtocol (one stolen from HDFS-881)

* Passes tests from earlier comment  plus a new one for files with lengths that don't match
up to checksum chunk size, as well as holding up to some load on TestDFSIO

* Still fails all tests relying on SimulatedFSDataset

* Has a large amount of TRACE level debugging going on in MultiplexedBlockSender in case anybody
wants to watch the output

* Adds dependencies for commons-pool and commons-math (for benchmarking code)

* Doesn't yet have benchmarks, but those should be easy now that the configuration is all
in place

> Use single Selector and small thread pool to replace many instances of BlockSender for
> --------------------------------------------------------------------------------------------
>                 Key: HDFS-918
>                 URL: https://issues.apache.org/jira/browse/HDFS-918
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Jay Booth
>             Fix For: 0.22.0
>         Attachments: hdfs-918-20100201.patch, hdfs-multiplex.patch
> Currently, on read requests, the DataXCeiver server allocates a new thread per request,
which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage
by the sending threads.  If we had a single selector and a small threadpool to multiplex request
packets, we could theoretically achieve higher performance while taking up fewer resources
and leaving more CPU on datanodes available for mapred, hbase or whatever.  This can be done
without changing any wire protocols.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message