hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Question about HFile seeking
Date Fri, 17 May 2013 06:53:12 GMT
On Thu, May 16, 2013 at 3:26 PM, Varun Sharma <varun@pinterest.com> wrote:

> Referring to your comment above again
> "If you doing a prefix scan w/ row1c, we should be starting the scan at
> row1c, not row1 (or more correctly at the row that starts the block we
> believe has a row1c row in it...)"
> I am trying to understand how you could seek right across to the block
> containing "row1c" using the HFile Index. If the index is just built on
> HFile keys and there is no demarcation b/w rows and col(s), you would hit
> the block for "row1,col1". After that you would either need a way to skip
> right across to "row1c" after you find that this is not the row you are
> looking for or you will have to simply keep scanning and discarding
> sequentially until you get "row1c". If you have to keep scanning and
> discarding, then that is probably suboptimal. But if there is a way to skip
> right across from "row1,col1" to "row1c", then thats great, though I wonder
> how that would be implemented.

There is no demarcation but the serialized row key in the index has the
lengths of each of the row key constituent parts embedded.

Here is where we write out an entry:


The entry is a KeyValue instances.  See how we pass the key and value
component separately into here:


The key part is serialized in a particular format.  It is described here:


(or see http://hbase.apache.org/book.html#keyvalue, where it is better
described... easier to read).

Notice how the serialization has row length, column family length,
qualifier length, etc.

While in KV, see how it has comparators that compare byte arrays that have
serialized keys in them first by row, then by column family, etc: i.e. by
each of the component parts in turn:

Here is where we save off the key if it is first in current block:


It gets added to index when we figure we have a block full of data (search
checkBlockBoundary in HFileWriterV2).

So keys in index are full-on keys.

At seek time, we eventually get down to an index lookup.  See here in hfile


Notice how there is a comparator and a byte array w/ a key in it.  The
comparator we read from the file.  It was comparator used writing the file.
 If you wander through this seeking method ignoring its handling of index
tiers, you'll end up eventually in the binary search of the index using our
key-conscious comparator:
KeyComparator over in KeyValue implements RawComparator so the
RawComparator you see here is a base type -- we have different comparators
dependent on whether it user data or system table files... that is another

Hope this gives you some insight on how we could skip to 'row1c' w/o going
through the content of 'row1' first.

You have a perf problem?  Maybe dump out content of a few hfiles that you
know you are having speed issues against.  It can be informative?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message