hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhangduo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
Date Tue, 18 Nov 2014 00:42:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215501#comment-14215501

zhangduo commented on HBASE-10201:

You could do something like Gaurav Menghani did (See https://issues.apache.org/jira/browse/HBASE-10201?focusedCommentId=14191203&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14191203)
suggests above where rather than report on successful flush, the highest sequenceid of all
a regions' memstores involved in a flush, instead, when you flush a column family only, you'd
have to report one less than the oldest outstanding edit still alive up in a column family
Yes, this is what the patch doing now. This is the way which has minimal impact on existing

What if you did something much less involved; when there is pressure to flush, flush the stores
with the oldest edits until you've freed enough memory?

I think we need to identify the reason why we need a flush. If we need a flush due to large
memstore size, then flush large store is enough. If we need a flush due to the oldest seqId
alived in memstore is far away from now(which means we have lots of WAL that can not be archived),
then we need to flush the store which has the oldest seqId in memstore(or maybe just flush
all the stores? simple but useful). Maybe I can change the return value of shouldFlush from
boolean to enum to indicate the reason why we need a flush.

> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>                 Key: HBASE-10201
>                 URL: https://issues.apache.org/jira/browse/HBASE-10201
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Ted Yu
>            Assignee: zhangduo
>            Priority: Critical
>             Fix For: 2.0.0, 0.98.9, 0.99.2
>         Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch,
HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch,
HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch,
> Currently the flush decision is made using the aggregate size of all column families.
When large and small column families co-exist, this causes many small flushes of the smaller
CF. We need to make per-CF flush decisions.

This message was sent by Atlassian JIRA

View raw message