hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Corgan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI
Date Fri, 27 May 2011 19:34:48 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040404#comment-13040404

Matt Corgan commented on HBASE-3927:

Ted - I think the problem I'm most often seeing on the user list is that people want the default
64K block size, but after they enable compression they don't raise the block size to compensate
for the compression.  In many cases it's easy to obtain compression of 10x or better, so the
blocks on disk are ~6K, which is smaller than anyone wants.

It's also true that data with large keys and small values (like an inverted index) tends to
compress well.  Those big keys also necessitate relatively large block cache entries.  Because
the block index has an entry for every block, it can get overly large when a user has large
keys and small compressed blocks.

Exposing this metric just a way to remind unsuspecting users that block size is calculated
based on uncompressed size, rather than compressed disk size which drives region splits. 
It should also make it easier to figure out how effective different compression algorithms
are, how big your compressed block size is, what percent of your data you can fit in block
cache, etc..  

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Priority: Minor
> The decision to split data blocks when flushing and compacting is made based on the uncompressed
data size which can often lead to compressed disk blocks that are a fraction of the intended
64 KB (default).  This often leads to a larger number of blocks and index entries than expected
and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would
be nice to expose this in the web UI to make it easier to calculate the compression ratio
and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..),
and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message