hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhangduo (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
Date Mon, 13 Oct 2014 05:36:35 GMT

     [ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

zhangduo updated HBASE-10201:
-----------------------------
    Attachment: HBASE-10201-0.98.patch

I port the 3149-trunk-v1.txt patch to branch 0.98(a "just make it work" version, not the final
version). 

Port to master is more difficult because of the rewrite of HLog. 
Flush per CF means we need to record the oldest sequence id per store instead of per region,
so the patch add a seqNum parameter when add kv to store, which means we need to know the
seqNum before we add kv to store.
It is easy on branch 0.98, just need to change the order of appendNoSync of wal and write
back to memstore(am I right?). But on master, HLog seems to use a event-driven framework,
and I am not sure when will the seqNum be determined.

The second problem is the flushSeqId. on 0.98, it is just a simple incAndGet, but on master
it uses a method in HLog. So on 0.98, if we only flush some of the stores, we can set the
flushSeqId to the oldest seqNum stored in the stores that not being flushed and do not inc
sequenceId. But on master, I do not know the side effect of the method.Is it ok to remove
the method call, or we still need to log something?

> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>
>                 Key: HBASE-10201
>                 URL: https://issues.apache.org/jira/browse/HBASE-10201
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ted Yu
>         Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch
>
>
> Currently the flush decision is made using the aggregate size of all column families.
When large and small column families co-exist, this causes many small flushes of the smaller
CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message