Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 2 Nov 2016 13:59:58 +0000 (UTC)
From: "Duo Zhang (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.13017216.1478091688000.158236.1478095198391@Atlassian.JIRA>
In-Reply-To: <JIRA.13017216.1478091688000@Atlassian.JIRA>
References: <JIRA.13017216.1478091688000@Atlassian.JIRA> <JIRA.13017216.1478091688117@arcas>
Subject: [jira] [Commented] (HBASE-16994) Region report a last flushed
 sequence id that is less than the previous last flushed sequence id
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 02 Nov 2016 14:00:00 -0000


    [ https://issues.apache.org/jira/browse/HBASE-16994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629054#comment-15629054 ] 

Duo Zhang commented on HBASE-16994:
-----------------------------------

Thanks for pointing out this, I think the stage to reproduce the bug is correct.

On the fix, I think we need to do the reset work after fencing mvcc? Otherwise you can not make sure whether the RingBufferEventHandler has done the sequence id accounting work. And if we do not have such a fencing when flush, then I think this is a very critical bug that we may lose data...

> Region report a last flushed sequence id that is less than the previous last flushed sequence id 
> -------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-16994
>                 URL: https://issues.apache.org/jira/browse/HBASE-16994
>             Project: HBase
>          Issue Type: Bug
>            Reporter: binlijin
>         Attachments: HBASE-16994_master_v1.patch, HBASE-16994_master_v2.patch
>
>
> Since append will be published to RingBuffer and handled asynchronously, it's possible that one append (say append-X) of the region handled by RingBufferEventHandler between startCacheFlush and getNextSequenceId, and reset FSHLog#oldestUnflushedStoreSequenceIds which we just cleared in #startCacheFlush. This might disturb ServerManager#flushedSequenceIdByRegion like shown below (assume region-A has two CF: cfA and cfB)
>    
> 1. flush-A runs to startCacheFlush and it will flush both cfA and cfB, oldestUnflushedStoreSequenceIds of regionA got cleared
>  2. append-X on cfB handled by RingBufferEventHandler, oldestUnflushedStoreSequenceIds set to 10, for example
>  3. flush-A runs to getNextSequenceId and returned 11
>  4. ServerManager#flushedSequenceIdByRegion for regionA set to 11
>  5. flush-A finishes
>  6. flush-B starts and only flush cfA, getNextSequenceId returned 10, and flushedSeqId will return 9, and cause warning in ServerManager
> Since this append-X will also got flushed, we should clear the oldestUnflushedStoreSequenceIds again to make sure we won't disturb
>  ServerManager#flushedSequenceIdByRegion.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)