hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
Date Wed, 02 Dec 2015 08:13:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035450#comment-15035450

Yu Li commented on HBASE-14906:

Also applied the same test case as [HBASE-10201|https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14171950&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14171950]
in real cluster, allow me to repeat the test method:

3 CFs, 16B value for CF1, 256B value for CF2 and 4K value for CF3, 1M rows, 128M memstore
flush size, 16M CF flush size.

And the result is:

w/o patch:
metric_storeCount : 3,
metric_storeFileCount : 9,
metric_memStoreSize : 112519968,
metric_storeFileSize : 4396528692,
metric_compactionsCompletedCount : 17,
metric_numBytesCompactedCount : 18891018964,
metric_numFilesCompactedCount : 89

w/ patch:
metric_storeCount : 3,
metric_storeFileCount : 13,
metric_memStoreSize : 58168928,
metric_storeFileSize : 4446829180,
metric_compactionsCompletedCount : 15,
metric_numBytesCompactedCount : 15101162833,
metric_numFilesCompactedCount : 82

Flush numbers of different column family:
w/o patch:
CF1: 9 times
CF2: 19 times
CF3: 39 times

w/ patch:
CF1: 4 times
CF2: 8 times
CF3: 53 times

>From the metrics we could see both compaction times and bytes involved in compaction reduced.

We could also see there're less flushes of small CF but more of large CF. This makes sense
by theory since more memstores of small cf retain in memory causing flush for large CF becoming
more frequent, until small cf also reaches the flush line.

> Improvements on FlushLargeStoresPolicy
> --------------------------------------
>                 Key: HBASE-14906
>                 URL: https://issues.apache.org/jira/browse/HBASE-14906
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.0.0
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: HBASE-14906.patch
> When checking FlushLargeStoragePolicy, found below possible improving points:
> 1. Currently in selectStoresToFlush, we will do the selection no matter how many actual
families, which is not necessary for one single family
> 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound could not fit
in all cases, and requires user to know details of the implementation to properly set it.
We propose to use "hbase.hregion.memstore.flush.size/column_family_number" instead:
> {noformat}
>   <property>
>     <name>hbase.hregion.percolumnfamilyflush.size.lower.bound</name>
>     <value>16777216</value>
>     <description>
>     If FlushLargeStoresPolicy is used and there are multiple column families,
>     then every time that we hit the total memstore limit, we find out all the
>     column families whose memstores exceed a "lower bound" and only flush them
>     while retaining the others in memory. The "lower bound" will be
>     "hbase.hregion.memstore.flush.size / column_family_number" by default
>     unless value of this property is larger than that. If none of the families
>     have their memstore size more than lower bound, all the memstores will be
>     flushed (just as usual).
>     </description>
>   </property>
> {noformat}

This message was sent by Atlassian JIRA

View raw message