Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A42081185F for ; Wed, 2 Jul 2014 00:26:25 +0000 (UTC) Received: (qmail 8329 invoked by uid 500); 2 Jul 2014 00:26:25 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 8279 invoked by uid 500); 2 Jul 2014 00:26:25 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 8255 invoked by uid 99); 2 Jul 2014 00:26:25 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Jul 2014 00:26:25 +0000 Date: Wed, 2 Jul 2014 00:26:25 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-11315) Keeping MVCC for configurable longer time MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049460#comment-14049460 ] stack commented on HBASE-11315: ------------------------------- bq. Should we suggest to keep MVCC forever in dev list? I suppose these will occupy space in the bc but in the scheme of things not much? They occupy space anyways since we have a placeholder in the KV/Cell class? If so, could keep them forever. Purging them means we take up less space in the fs but not in bc? If so, I suppose if we are looking at the keys at compaction time anyways, no harm letting them go either. if the cost of keeping them in fs is small and we KNOW they are permanent, we might be able to simplify in a few places and exploit this facility in others. That'd be the argument for keeping them forever. > Keeping MVCC for configurable longer time > ------------------------------------------ > > Key: HBASE-11315 > URL: https://issues.apache.org/jira/browse/HBASE-11315 > Project: HBase > Issue Type: Improvement > Affects Versions: 0.99.0 > Reporter: Jeffrey Zhong > Assignee: Jeffrey Zhong > Attachments: hbase-11315.patch > > > After hbase-8763, we need keep mvcc number longer in hfile so that it can be used to order changes during writes. For example, the known put,delete,put,... scenario, cross region server scan, out of order puts(in recovery case). > Current thinking is that we make the retention period configurable(below we're using 1 day to explain). During major compaction, we check hfile's creation time if a hfile creation time is older than 1 day then all mvcc of KVs in that hfile will be removed. If a hfile is created within 1 day, then all mvccs of KVs in that hfile will be kept. > In case there are time clock skew, we can firstly sort hfiles based on its seqId in ascending order and find the first hfile's creation time stamp less than 1 day. Then mvcc of all hfiles before the found file will be removed during compaction. -- This message was sent by Atlassian JIRA (v6.2#6252)