Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F73B1759C for ; Tue, 7 Apr 2015 00:29:35 +0000 (UTC) Received: (qmail 87702 invoked by uid 500); 7 Apr 2015 00:29:13 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 87656 invoked by uid 500); 7 Apr 2015 00:29:13 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 87640 invoked by uid 99); 7 Apr 2015 00:29:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Apr 2015 00:29:12 +0000 Date: Tue, 7 Apr 2015 00:29:12 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482283#comment-14482283 ] stack commented on HBASE-13389: ------------------------------- bq. So with all this I do see any reason to keep these for more than a few hours. Its not log rolling as per Enis. It is when memstore is flushed. Default is memstores are flushed at least once an hour: public static final int DEFAULT_CACHE_FLUSH_INTERVAL = 3600000; So if an old edit comes in during distributed log replay, an edit that has already been flushed to an hfile, we need to be able to put it in the appropriate slot (as you say). This can happen if we are overplaying edits in case where Master does not have last flush sequenceid on a region. If HFiles have all their seqids, it is easy. But if mvcc has been purged from hfiles (optimization) and we get an edit that falls into the hfile time range, we are going to be confused. Somehow the optimization purging mvcc should not run until we are sure old WALs with seqids older than those in hfiles for all regions have been let go. For replication, yeah, needs a few days. The root of the lag may take a few days to fix. On the put -> delete -> put, you are not against changing sort order so that seqid prevails over type are you [~lhofhansl]? Would be good change for 2.0. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > ------------------------------------------------------------- > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance > Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the mvcc/sequenceid slot in a key. Now Cells near-always have an associated mvcc/sequenceid where previous it was rare or the mvcc was kept up at the file level. This is sort of how it should be many of us would argue but as a side-effect of this change, read-time optimizations that helped speed scans were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against the actual smallestReadpoint, and hence we're always performing all the checks, tests, and comparisons that these jiras removed in addition to actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)