hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Spiegelberg (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3149) Make flush decisions per column family
Date Wed, 22 Feb 2012 19:03:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213858#comment-13213858
] 

Nicolas Spiegelberg commented on HBASE-3149:
--------------------------------------------

@Lars/Stack: note that the number of StoreFiles necessary to store N amount of data is order
O(log N) with the existing compaction algorithm.  This means that setting the compaction min
size to a low value will not result in significantly more files.  Furthermore, what's hurting
performance is not the amount of files but the size of each file.  The extra files will be
very small and take up only a minority of the space in the LRU cache.  Every time you unnecessarily
compact files, you have to repopulate that StoreFile in the LRU cache and get a lot of disk
reads in addition to the obvious write increase.  This is all to say that I would recommend
defaulting it to that low because the downsides are very minimal and the benefit can be substantial
IO gains.

bq. At the same time, I'd think this issue still worth some time; if lots of cfs and only
one is filling, its silly to flush the others as we do now because one is over the threshold.

Why is this silly?  With cache-on-write, the data is still cached in memory.  It's just migrated
from the MemCache to the BlockCache, which has comparable performance.  Furthermore, BlockCache
data is compressed, so it then takes up less space.  Flushing also minimizes the amount of
HLogs and decreases recovery time.  Flushing would be bad if it meant we weren't optimally
using the global MemStore size, but we currently are.

bq. This surely seems a specific setting for this use-case, and there are others that need
a slightly different setting. If you mix those two on the same cluster, then having only one
global setting to adjust this seems restrictive? Should this be a setting per table, like
the flush size?

I think this is a better default, not that it's a one-size setting.  I agree that this should
toggleable on a per-CF basis, hence HBASE-5335.
                
> Make flush decisions per column family
> --------------------------------------
>
>                 Key: HBASE-3149
>                 URL: https://issues.apache.org/jira/browse/HBASE-3149
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Nicolas Spiegelberg
>            Priority: Critical
>             Fix For: 0.92.1
>
>
> Today, the flush decision is made using the aggregate size of all column families. When
large and small column families co-exist, this causes many small flushes of the smaller CF.
We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message