hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars George (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3149) Make flush decisions per column family
Date Tue, 21 Feb 2012 20:50:49 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212938#comment-13212938

Lars George commented on HBASE-3149:

bq. At the same time, I'd think this issue still worth some time; if lots of cfs and only
one is filling, its silly to flush the others as we do now because one is over the threshold.

I thought so too. Setting the hbase.hstore.compaction.size to 4MB, and having the flush size
at 256MB, it means you will never compact flush files larger than 4MB. So, in other words,
only if you are flushing small files (say from a small, dependent column family) you are running
a minor compaction on them. For the larger family you typically do not run those at all, right?

This surely seems a specific setting for this use-case, and there are others that need a slightly
different setting. If you mix those two on the same cluster, then having only one global setting
to adjust this seems restrictive? Should this be a setting per table, like the flush size?

It still seems to me that decoupling is what we should have available as well. But I thought
about it for a while as well as discussed this various people: it seems that decoupling brings
its own set of issues, for example, you might end up with too many HLog files because the
small family is flushed only rarely. 
> Make flush decisions per column family
> --------------------------------------
>                 Key: HBASE-3149
>                 URL: https://issues.apache.org/jira/browse/HBASE-3149
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Nicolas Spiegelberg
>            Priority: Critical
>             Fix For: 0.92.1
> Today, the flush decision is made using the aggregate size of all column families. When
large and small column families co-exist, this causes many small flushes of the smaller CF.
We need to make per-CF flush decisions.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message