hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhangduo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
Date Wed, 15 Oct 2014 10:45:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172250#comment-14172250
] 

zhangduo commented on HBASE-10201:
----------------------------------

Run the same benchmark on a 3 regionservers cluster(2 * Xeon E5-2650 2.6G, 3T * 11 sata),
the result is smililar.

Without per CF flush:
metric_storeCount: 3,
metric_storeFileCount: 9,
metric_memStoreSize: 39965016,
metric_storeFileSize: 4460709275,
metric_compactionsCompletedCount: 46,
metric_numBytesCompactedCount: 11030906070,
metric_numFilesCompactedCount: 145,
Write amplification: 2.47

With per CF flush:
metric_storeCount: 3,
metric_storeFileCount: 7,
metric_memStoreSize: 110195648,
metric_storeFileSize: 4369570622,
metric_compactionsCompletedCount: 27,
metric_numBytesCompactedCount: 10353718691,
metric_numFilesCompactedCount: 89,
Write amplification: 2.37

The patch has a big impact on compactionsCompletedCount, but a small impact on numBytesCompactedCount.
This is reasonable, the patch only prevent flushing small files of small CFs and reduce its
compaction number, but most numBytesCompactedCount is contributed by large CFs which is not
effected(or at least, very small) by this patch. So we only get a small improvement of write
amplification(5%~10%).


> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>
>                 Key: HBASE-10201
>                 URL: https://issues.apache.org/jira/browse/HBASE-10201
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ted Yu
>         Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch,
HBASE-10201-0.98_2.patch
>
>
> Currently the flush decision is made using the aggregate size of all column families.
When large and small column families co-exist, this causes many small flushes of the smaller
CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message