Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3D00C200C02 for ; Thu, 5 Jan 2017 15:41:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3B822160B26; Thu, 5 Jan 2017 14:41:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 867B0160B27 for ; Thu, 5 Jan 2017 15:40:59 +0100 (CET) Received: (qmail 68965 invoked by uid 500); 5 Jan 2017 14:40:58 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 68929 invoked by uid 99); 5 Jan 2017 14:40:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jan 2017 14:40:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 63A302C2A66 for ; Thu, 5 Jan 2017 14:40:58 +0000 (UTC) Date: Thu, 5 Jan 2017 14:40:58 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-17407) Correct update of maxFlushedSeqId in HRegion MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 05 Jan 2017 14:41:00 -0000 [ https://issues.apache.org/jira/browse/HBASE-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801513#comment-15801513 ] stack commented on HBASE-17407: ------------------------------- bq. (but for some reason the comment was deleted). I intentionally deleted the comment because I felt it added little benefit to the back-and-forth here. bq. I think it was important to understand that in the current state there is no danger of data loss. You mean with the finalizeFlush/updateStore calls in place and NO inmemory compaction -- just BASIC mode where we flush all in the pipeline? If the above, I think so. That said, the finalizeFlush/updateStore calls are new moving pieces and this corner cases are hard to manufacture. bq. Code maintainability is also important. Yes. This sequenceid accounting is unfortunately involved and tough to test. bq. I can replace finalizeFlush with a preFlushSeqIDEstimation() which returns a lower bound on the sequence id that is invoked before we start the flush. You think this will restore our sequence id accounting to what it was before finalizeFlush/updateStore ? How will we deal with the gap between the new edits coming in filling lowestUnflushedSequenceIds after we have swapped it out to do the current and the edits in the pipeline that did not get flushed during the current flush session? bq. You say WAL truncation cannot be triggered during a flush. Indeed. See how closeBarrier is used in AbstractFSWAL bq. Can the map in seq accounting be reported to master during a flush? See HRegion#setCompleteSequenceId where we build our sequenceid to send to the master. See how it asks the WAL subsystem for earliest edit by column family: long earliest = this.wal.getEarliestMemstoreSeqNum(encodedRegionName, familyName); Here is the implementation: {code} @Override public long getEarliestMemstoreSeqNum(byte[] encodedRegionName, byte[] familyName) { // This method is used by tests and for figuring if we should flush or not because our // sequenceids are too old. It is also used reporting the master our oldest sequenceid for use // figuring what edits can be skipped during log recovery. getEarliestMemStoreSequenceId // from this.sequenceIdAccounting is looking first in flushingOldestStoreSequenceIds, the // currently flushing sequence ids, and if anything found there, it is returning these. This is // the right thing to do for the reporting oldest sequenceids to master; we won't skip edits if // we crash during the flush. For figuring what to flush, we might get requeued if our sequence // id is old even though we are currently flushing. This may mean we do too much flushing. return this.sequenceIdAccounting.getLowestSequenceId(encodedRegionName, familyName); } {code} It tries to explain how it works. That it returns flushingSequenceIds and then lowestUnflushedSequenceIds if former is not present may be what [~Apache9] is referring to in the 'not report the value if a flush ongoing' (I did not see a block on reporting during 'flush' -- maybe I'm looking in wrong place). Thanks. > Correct update of maxFlushedSeqId in HRegion > -------------------------------------------- > > Key: HBASE-17407 > URL: https://issues.apache.org/jira/browse/HBASE-17407 > Project: HBase > Issue Type: Bug > Reporter: Eshcar Hillel > > The attribute maxFlushedSeqId in HRegion is used to track the max sequence id in the store files and is reported to HMaster. When flushing only part of the memstore content this value might be incorrect and may cause data loss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)