hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhangduo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
Date Thu, 13 Nov 2014 00:13:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208962#comment-14208962

zhangduo commented on HBASE-10201:

I used to think this should go into master first as an experimental feature, and HBASE-12405
is based on this issue.

Now I think you'are right stack, seems making HBASE-10201 base on the work of HBASE-12405
is more natural, not the reverse

Never mind, I will finish HBASE-12405 as soon as possible

How'd you generate the numbers and what is WAF?

I create a table with 3 CFs, disable split(use a large constants split size), and put 1M rows
into the table.
key is a 16B, and 16B value for CF1, 256B value for CF2, 4KB value for CF3.
the result number is copied from the jmx web page of regionserver.

WAF is short for "Write Amplification", and I calculate it simply by numBytesCompactedCount/storeFileSize


> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>                 Key: HBASE-10201
>                 URL: https://issues.apache.org/jira/browse/HBASE-10201
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Ted Yu
>            Assignee: zhangduo
>            Priority: Critical
>             Fix For: 2.0.0, 0.98.9, 0.99.2
>         Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch,
HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch,
HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch
> Currently the flush decision is made using the aggregate size of all column families.
When large and small column families co-exist, this causes many small flushes of the smaller
CF. We need to make per-CF flush decisions.

This message was sent by Atlassian JIRA

View raw message