hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-5979) Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers
Date Wed, 09 May 2012 22:33:52 GMT
Todd Lipcon created HBASE-5979:

             Summary: Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers
                 Key: HBASE-5979
                 URL: https://issues.apache.org/jira/browse/HBASE-5979
             Project: HBase
          Issue Type: Improvement
          Components: performance, regionserver
            Reporter: Todd Lipcon

Currently, every HFile.Reader has a single DFSInputStream, which it uses to service all gets
and scans. For gets, we use the positional read API (aka "pread") and for scans we use a synchronized
block to seek, then read. The advantage of pread is that it doesn't hold any locks, so multiple
gets can proceed at the same time. The advantage of seek+read for scans is that the datanode
starts to send the entire rest of the HDFS block, rather than just the single hfile block
necessary. So, in a single thread, pread is faster for gets, and seek+read is faster for scans
since you get a strong pipelining effect.

However, in a multi-threaded case where there are multiple scans (including scans which are
actually part of compactions), the seek+read strategy falls apart, since only one scanner
may be reading at a time. Additionally, a large amount of wasted IO is generated on the datanode
side, and we get none of the earlier-mentioned advantages.

In one test, I switched scans to always use pread, and saw a 5x improvement in throughput
of the YCSB scan-only workload, since it previously was completely blocked by contention on
the DFSIS lock.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message