hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Spiegelberg (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3149) Make flush decisions per column family
Date Tue, 18 Jan 2011 18:32:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983312#action_12983312

Nicolas Spiegelberg commented on HBASE-3149:

Some interesting stats. We did some rough calculations internally to see what effect an uneven
distribution of data into column families was having on our network IO. Our data distribution
for 3 column families was 1:1:20. When we looked at the flush:minor-compaction ratio for each
of the store files, the large column family had a 1:2 ratio but the small CFs both had a 1:20
ratio! We are looking at roughly a 10% network IO decrease if we can bring those other 2 CFs
down to a 1:2 ratio as well.

> Make flush decisions per column family
> --------------------------------------
>                 Key: HBASE-3149
>                 URL: https://issues.apache.org/jira/browse/HBASE-3149
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Karthik Ranganathan
> Today, the flush decision is made using the aggregate size of all column families. When
large and small column families co-exist, this causes many small flushes of the smaller CF.
We need to make per-CF flush decisions.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message