hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
Date Tue, 09 Dec 2014 20:02:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239958#comment-14239958
] 

Jeffrey Zhong commented on HBASE-10201:
---------------------------------------

This is a nice feature. I scan through the patch and below are my comments:

1) There may be a correctness issue for same version(same row key & version) updates.
Because you use following code as store file flush id, we could end up multiple hstore files
with exact same flush seq id. While HBase resolve same version updates by store files' seqid(flush
id). Therefore, we may end up with incorrect results.  This issue may only happen in 0.98
though.
{code}
+          long oldestUnflushedSeqId = wal
+              .getEarliestMemstoreSeqNum(encodedRegionName);
{code} 
In order to fix the issue, we should use current store's max flushed seq id as its real hstore
seq id. While we need to change HRegion.lastFlushSeqId to use oldestUnflushedSeqId to report
back Master otherwise we may have data loss issue.

2)  We have a feature where we force a flush by hbase.regionserver.optionalcacheflushinterval
or hbase.regionserver.flush.per.changes while I didn't see you handle both cases in selectStoresToFlush()
function. This may cause HRegion.shouldFlush() always return true and end up with small hstore
files.

3) For region server recovery, we have an optimization by using lastFlushSeqId reported by
region servers to skip writing edits into recovered.edits files. With this feature, we may
unnecessarily write much more data into recovered.edits. This issue doesn't happen in log
replay case.

4) Relating to your FlushMarker question, FulshMarker(or similar RegionEventWALEdit) are used
for region replica feature and reasoning on region/store state. As you can see(in WALEdit
class), those special events are using special column family "METAFAMILY" which doesn't exist
for data regions. You should handle those events specially in getFamilyNames() otherwise they
may affect your book keeping on oldest un-flushed seqid.  


> Port 'Make flush decisions per column family' to trunk
> ------------------------------------------------------
>
>                 Key: HBASE-10201
>                 URL: https://issues.apache.org/jira/browse/HBASE-10201
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Ted Yu
>            Assignee: zhangduo
>            Priority: Critical
>             Fix For: 1.0.0, 2.0.0, 0.98.9
>
>         Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch,
HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch,
HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch,
HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch,
HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch,
HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png
>
>
> Currently the flush decision is made using the aggregate size of all column families.
When large and small column families co-exist, this causes many small flushes of the smaller
CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message