Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 27820200BD7 for ; Wed, 2 Nov 2016 15:00:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 262A0160AEA; Wed, 2 Nov 2016 14:00:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6940E160B0A for ; Wed, 2 Nov 2016 14:59:59 +0100 (CET) Received: (qmail 95319 invoked by uid 500); 2 Nov 2016 13:59:58 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 95200 invoked by uid 99); 2 Nov 2016 13:59:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2016 13:59:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 607312C1F56 for ; Wed, 2 Nov 2016 13:59:58 +0000 (UTC) Date: Wed, 2 Nov 2016 13:59:58 +0000 (UTC) From: "Duo Zhang (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-16994) Region report a last flushed sequence id that is less than the previous last flushed sequence id MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 02 Nov 2016 14:00:00 -0000 [ https://issues.apache.org/jira/browse/HBASE-16994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629054#comment-15629054 ] Duo Zhang commented on HBASE-16994: ----------------------------------- Thanks for pointing out this, I think the stage to reproduce the bug is correct. On the fix, I think we need to do the reset work after fencing mvcc? Otherwise you can not make sure whether the RingBufferEventHandler has done the sequence id accounting work. And if we do not have such a fencing when flush, then I think this is a very critical bug that we may lose data... > Region report a last flushed sequence id that is less than the previous last flushed sequence id > ------------------------------------------------------------------------------------------------- > > Key: HBASE-16994 > URL: https://issues.apache.org/jira/browse/HBASE-16994 > Project: HBase > Issue Type: Bug > Reporter: binlijin > Attachments: HBASE-16994_master_v1.patch, HBASE-16994_master_v2.patch > > > Since append will be published to RingBuffer and handled asynchronously, it's possible that one append (say append-X) of the region handled by RingBufferEventHandler between startCacheFlush and getNextSequenceId, and reset FSHLog#oldestUnflushedStoreSequenceIds which we just cleared in #startCacheFlush. This might disturb ServerManager#flushedSequenceIdByRegion like shown below (assume region-A has two CF: cfA and cfB) > > 1. flush-A runs to startCacheFlush and it will flush both cfA and cfB, oldestUnflushedStoreSequenceIds of regionA got cleared > 2. append-X on cfB handled by RingBufferEventHandler, oldestUnflushedStoreSequenceIds set to 10, for example > 3. flush-A runs to getNextSequenceId and returned 11 > 4. ServerManager#flushedSequenceIdByRegion for regionA set to 11 > 5. flush-A finishes > 6. flush-B starts and only flush cfA, getNextSequenceId returned 10, and flushedSeqId will return 9, and cause warning in ServerManager > Since this append-X will also got flushed, we should clear the oldestUnflushedStoreSequenceIds again to make sure we won't disturb > ServerManager#flushedSequenceIdByRegion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)