hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2080) Speed up DFS read path
Date Fri, 17 Jun 2011 15:30:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051137#comment-13051137

Todd Lipcon commented on HDFS-2080:

Nathan: yea, both CPU time and sys time improve by these optimizations.

Kihwal: using zlib instead of the hardware crc gives only about a 40% improvement. It's true
that a disk won't pump out data at rates approaching 1GB/sec, but Nathan's metric of CPUsecs/MB
is still very important, eg on multitenant clusters. Another important case is the HBase serving
case where the majority of the data being read from HDFS will actually be in the Linux buffer
cache. I've benchmarked that 3/4 of the latency of such reads comes from CPU-time rather than
context switching (try TestHFileSeek from HBase on RawLocalFS vs LocalFS)

> Speed up DFS read path
> ----------------------
>                 Key: HDFS-2080
>                 URL: https://issues.apache.org/jira/browse/HDFS-2080
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.23.0
> I've developed a series of patches that speeds up the HDFS read path by a factor of about
2.5x (~300M/sec to ~800M/sec for localhost reading from buffer cache) and also will make it
easier to allow for advanced users (eg hbase) to skip a buffer copy. 

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message