hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HBase as a large, auto-partitioned, sorted, *in-memory* database (was: Re: prefix compression implementation)
Date Tue, 20 Sep 2011 05:08:15 GMT
Excellent summary Matt.  Some notes in the below.

On Sun, Sep 18, 2011 at 6:43 PM, Matt Corgan <mcorgan@hotpads.com> wrote:
> ... All of this is relatively easy for the data
> and index blocks because they're immutable.  Doing it for the memstore is
> another story...

We'd need another data structure completely, wouldn't we?  Have you
given it any thought?

> Here's some contrived data to illustrate.  Row key is a (reversed
> domain) url.  Column family is called "familyName".  ColQualifier is a
> browser type.  Value is the number of views from that browser type.  Here's
> the current representation <http://pastebin.com/7ks8kzJ2>, the prefix trie's
> toString output <http://pastebin.com/cL4AkCPC> for the row keys, and removing
> the whitespace <http://pastebin.com/4qiSXNh9> from the toString output.

Thats a big diff.

> The second problem is the lack of random access to cell offsets within the
> data block.  (I'm not 100% sure on this one, so please correct me if i'm
> wrong).  I noticed how bad this problem is when i was storing historical
> event logs with 8 byte keys and small values (so ~30b per cell).  I had to
> increase block size to 256KB because the block indexes were too big to fit
> in memory.  Then I needed fast random access to these events.  The problem
> is that there are ~10,000 cells per block, so without random lookups inside
> the block, it's seeking through ~5,000 keys for the average lookup.  That's
> a crazy amount of overhead to retrieve a single cell.  Probably the quickest
> solution to this problem is to store the offset of each key in a list of
> integers at the end of the block so that it's possible to do a binary search
> inside the block.  That would reduce it to ~14 avg memory accesses to find
> the cell.

I'd imagine that we'd not want the index always, just when keys per
block went over some size.

hfilev2 should help here because we don't load all indices all the time.

> But, the prefix trie should actually be way faster than a binary search
> because you don't have to keep comparing the beginning of the key to the one
> you're looking for.  I'll save the details for another email or for code
> comments.  In general, with a smarter block/index format we could be much
> more efficient than the current method of comparing full key byte[] over and
> over again.
> Same random access problem also applies to the block index i think (correct
> me if i'm wrong here too).

You should stick the above in the issue Matt, or at least refer to
this mail in the issue; its great stuff.


View raw message