Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 21 Nov 2014 00:59:34 +0000 (UTC)
From: "Jeffrey Zhong (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12711364.1398822319000.550442.1416531574821@Atlassian.JIRA>
In-Reply-To: <JIRA.12711364.1398822319000@Atlassian.JIRA>
References: <JIRA.12711364.1398822319000@Atlassian.JIRA>
 <JIRA.12711364.1398822319020@arcas>
Subject: [jira] [Commented] (HBASE-11099) Two situations where we could open
 a region with smaller sequence number
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-11099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220334#comment-14220334 ] 

Jeffrey Zhong commented on HBASE-11099:
---------------------------------------

{quote}
Is this speculation or something from phoenix or so?
{quote}
Currently it's a possible scenario by checking the code

{quote}
this a 0.98 issue too?
{quote}
Yes, that's a 0.98 issue too. [~apurtell] This is a low risk fix. It's better to get it in 0.98 as well. Thanks.
  

> Two situations where we could open a region with smaller sequence number
> ------------------------------------------------------------------------
>
>                 Key: HBASE-11099
>                 URL: https://issues.apache.org/jira/browse/HBASE-11099
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.99.1
>            Reporter: Jeffrey Zhong
>            Assignee: Stephen Yuan Jiang
>             Fix For: 2.0.0, 0.99.2
>
>         Attachments: HBASE-11099.v1-2.0.patch
>
>
> Recently I happened to run into code where we potentially could open region with smaller sequence number:
> 1) Inside function: HRegion#internalFlushcache. This is due to we change the way WAL Sync where we use late binding(assign sequence number right before wal sync).
> The flushSeqId may less than the change sequence number included in the flush which may cause later region opening code to use a smaller than expected sequence number when we reopen the region.
> {code}
> flushSeqId = this.sequenceId.incrementAndGet();
> ...
> mvcc.waitForRead(w);
> {code}
> 2) HRegion#replayRecoveredEdits where we have following code:
> {code}
> ...
>           if (coprocessorHost != null) {
>             status.setStatus("Running pre-WAL-restore hook in coprocessors");
>             if (coprocessorHost.preWALRestore(this.getRegionInfo(), key, val)) {
>               // if bypass this log entry, ignore it ...
>               continue;
>             }
>           }
> ...
>           currentEditSeqId = key.getLogSeqNum();
> {code} 
> If coprocessor skip some tail WALEdits, then the function will return smaller currentEditSeqId. In the end, a region may also open with a smaller sequence number. This may cause data loss because Master may record a larger flushed sequence Id and some WALEdits maybe skipped during recovery if the region fail again.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)