hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13811) Splitting WALs, we are filtering out too many edits -> DATALOSS
Date Fri, 05 Jun 2015 01:12:38 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573900#comment-14573900
] 

Duo Zhang commented on HBASE-13811:
-----------------------------------

{quote}
Rather than add a new method that does what the old getEarliestMemstoreSeqNum did, I changed
getEarliestMemstoreSeqNum to be how the old version worked.
{quote}
Fine, I think it will work. But I still feel a little nervous to have two methods which have
same name but different behaviors...

And I remember that, when implmenting HBASE-10201 and HBASE-12405, actually I wanted to return
the flushedSeqId when calling startCacheFlush first. But there are two problems. First is
getNextSequenceId method is in HRegion, not in FSHLog, so a simple solution is return NO_SEQ_NUM
when flushing all stores and let HRegion call getNextSequenceId. But here comes the second
problem, startCacheFlush may fail which means we can not start a flush, so there are three
types of return values, 'sequenceId', 'choose a sequenceId by yourself', 'give up flushing!'.
I think it is ugly to have a '-2' or a null java.lang.Long to indicate a 'give up flushing'
at that time so I gave up...

Maybe we could consider this solution again? getEarliestMemstoreSeqNum can be used everywhere
but startCacheFlush is restricted in the flushing scope I think.

Thanks.

> Splitting WALs, we are filtering out too many edits -> DATALOSS
> ---------------------------------------------------------------
>
>                 Key: HBASE-13811
>                 URL: https://issues.apache.org/jira/browse/HBASE-13811
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 2.0.0, 1.2.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 2.0.0, 1.2.0
>
>         Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, 13811.v2.branch-1.txt,
13811.v3.branch-1.txt, 13811.v3.branch-1.txt, 13811.v4.branch-1.txt, 13811.v5.branch-1.txt,
13811.v6.branch-1.txt, 13811.v6.branch-1.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch
>
>
> I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler
to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place
so can only think it the cause (but cannot see how). When we split the logs, we are skipping
legit edits. Digging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message