hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HBase random access in HDFS and block indices
Date Fri, 29 Oct 2010 16:59:45 GMT
On Mon, Oct 18, 2010 at 9:30 PM, Matt Corgan <mcorgan@hotpads.com> wrote:
> I was envisioning the HFiles being opened and closed more often, but it
> sounds like they're held open for long periods and that the indexes are
> permanently cached.  Is it roughly correct to say that after opening an
> HFile and loading its checksum/metadata/index/etc then each random data
> block access only requires a single pread, where the pread has some
> threading and connection overhead, but theoretically only requires one disk
> seek.  I'm curious because I'm trying to do a lot of random reads, and given
> enough application parallelism, the disk seeks should become the bottleneck
> much sooner than the network and threading overhead.

You have it basically right.

On region deploy, all files that comprise a region are opened and
thereafter held opened.  Part of opening is reading in index and file
metadata so opened files occupy some memory.  An optimization would be
to let go of unused files reopening on access.


View raw message