Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB93017396 for ; Tue, 21 Apr 2015 05:27:00 +0000 (UTC) Received: (qmail 60570 invoked by uid 500); 21 Apr 2015 05:27:00 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 60523 invoked by uid 500); 21 Apr 2015 05:27:00 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 60511 invoked by uid 99); 21 Apr 2015 05:27:00 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Apr 2015 05:27:00 +0000 Date: Tue, 21 Apr 2015 05:27:00 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504361#comment-14504361 ] stack commented on HBASE-13389: ------------------------------- Thinking on it, during out-of-order DLR, there are a few ways in which we could lose data if we bring back the optimization that zeros all mvccs promoting the highest mvcc seen to be the hfiles mvcc kept in the hfile metadata. During recovery of a region during DLR, we may flush hfiles in a manner such that the older edits are in the most recently flushed file or hfiles are made of edits that do not have a linearly increasing mvcc. This is a violation of tenets that hold when flushes always drop files that have mvcc/sequenceid in excess of files currently present in the filesystem (and whose edits have increasing mvccs) We have to be careful compacting these files dropped during recovery. We need to compact them all up together first -- after the region comes on line -- before we can mix them in with zero'd mvcc files (it has to be after region comes online and not before because region may crash during recovery having dropped one or more out-of-order hfiles) Here is an illustration. A region is recovering. It comes under memory pressure so flushes the edits it received so far. It so happens that it mostly received older edits but a few new ones came in too. It dumps out (Let the letters be keys and the numbers mvcc): A 2 B 4 C 10 Recovery completes and it drops another hfile: A 1 B 5 C 11 Now, if we compact the first file with a zero'd mvcc file with a sequenceid of 8, the product will be a zero'd mvcc hfile whose seqid is 10. If we then compact this '10' file with the second file flushed, we lose the 'B 5' edit because it is < '10'. Even if we compacted all three files together -- the zero'd mvcc hfile and the two files dropped during recovery -- we could lose 'B 5' and 'A 2' since both have mvccs < '10'. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > ------------------------------------------------------------- > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance > Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the mvcc/sequenceid slot in a key. Now Cells near-always have an associated mvcc/sequenceid where previous it was rare or the mvcc was kept up at the file level. This is sort of how it should be many of us would argue but as a side-effect of this change, read-time optimizations that helped speed scans were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against the actual smallestReadpoint, and hence we're always performing all the checks, tests, and comparisons that these jiras removed in addition to actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)