hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: FW: Read op question
Date Fri, 08 Jan 2010 22:35:50 GMT
On Fri, Jan 8, 2010 at 1:40 PM, Mridul Muralidharan
<mridulm@yahoo-inc.com>wrote:

> So my question is, what indexing is present on an HRegion to support a read
> of a single record?  Aside from looking in the MemStore, how do you know
> what HFiles to read?  On opening an HFile, do you scan the whole thing?
>
> HFiles have an index.  They are like the sstable files in bigtable paper.
 See
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/io/hfile/HFile.htmlfor
a bit of doc. on hfile format, etc.  They are made of blocks that are
by
default 64k in size.  The index, stored on the tail of the hfile, has the
offset of each block and the key that starts that block.  All hfiles are
open and kept open.  On open, their metadata including index is read into
memory.  A lookup for a particular cell will look in memstore, and then each
hfile.  If the wanted-cell is outside of the start/end key of the hfile
(start and end keys are part of metadata), we'll skip the file and move on
to the next.   Otherwise we'll find where to start seeking by doing a lookup
in the index.  We'll find the exact key (not usual) or the key just before
and then seek and read in the 64k block.  We'll move through the block until
we find (or not) the wanted key.  TODO, add bloomfilters on hfiles so we can
avoid the seek if wanted key is not present in the file.

St.Ack

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message