hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Corgan <mcor...@hotpads.com>
Subject Re: heap memory allocation
Date Fri, 15 Apr 2011 17:15:14 GMT
J-D - yes, I think that increasing block size by 4x would cut the index size
by 4x, assuming 1 index entry per block.  Random reads are not as important
for this table, and tend to cache well.  I guess that on the flip-side,
scans might be faster.  If I start to see problems, I guess my other option
is to reduce the memstore and blockcache percentages.

Stack - thanks, that's the exact issue.

A lot of the data in this table is very cold archived data, so storing the
indexes and blooms in the block cache where they could get evicted would
make sense.  Lars left a comment about compressing the indexes... i could
see using something like an array based binary trie to represent the index
more compactly, while also adding performance.  You could even bundle bytes
into int or long "words" for faster traversal.

On Fri, Apr 15, 2011 at 12:47 PM, Stack <stack@duboce.net> wrote:

> On Fri, Apr 15, 2011 at 8:23 AM, Matt Corgan <mcorgan@hotpads.com> wrote:
> > Some of our servers have 5.2gb hbase heaps with the standard 40% (2gb)
> > memstore and 20% (1gb) blockcache.  I'm wondering where the indexes and
> > bloom filters get counted.  Is it in that remaining 2gb, and are the
> bloom
> > filters counted in the storefileIndexSize=566 metric?
> >
> Indexes are the storefileIndexSize metric.
> I don't think we count bloom space or rather, I believe they are kept
> in the cache (someone correct me if I'm off).
> > The overall data in each server isn't too large on disk (~50gb) after
> gzip
> > compression of 25x, but it's made of long keys with short values so there
> is
> > a lot of metadata.  I'm thinking of upping the block size to 256k but
> > thought i'd ask how it worked first.
> >
> This skew -- long keys and short values -- makes for bigger indices
> for sure.  Marc Limotte in an earlier thread takes a look at this
> indexing sizing. At first it seemed like the math was off and then he
> looked at his storefiles and figured that it starts to add up .  See
> http://hbase.apache.org/book/keysize.html
> St.Ack

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message