hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Help needed! Performance related questions
Date Thu, 14 Oct 2010 18:51:21 GMT
> 1. about the row searching mechanism, I understand the part before the
> HBase locate where the row resides in which region. I am confused
> after that. So, I am going to write down what I understand so far,
> please correct me if it's wrong.
> a. The HRegion Store identifies where the row is in which HFile.
> b. There is a block index in HFile identify which block this row resides.
> c. If the row size is smaller than block size (which mean a block has
> multiple rows), HBase has to traverse in that block to locate the row
> matching the key. The traverse is sequence traverse.

More or less.

> 2. And if the row size is larger than the block size, what's going to
> happen? Does the block index in HFile point to multiple blocks which
> contains different cells of that row?

The block index stores full keys, row+family+qualifier+timestamp, so
it's not talking in terms of total row size. A single row can have
multiple blocks (in multiple files) with possibly as many entries in
the block index. If a single cell is larger than the block size, then
the size of that block will be the size of that cell.

> 3. Does a column family has to reside inside one block, which means a
> column family cannot be larger than a block?

My previous answer covers this.


View raw message