Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1706EEDAA for ; Mon, 25 Feb 2013 19:17:11 +0000 (UTC) Received: (qmail 10263 invoked by uid 500); 25 Feb 2013 19:17:08 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 10208 invoked by uid 500); 25 Feb 2013 19:17:08 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 10199 invoked by uid 99); 25 Feb 2013 19:17:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Feb 2013 19:17:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sergey@hortonworks.com designates 209.85.212.174 as permitted sender) Received: from [209.85.212.174] (HELO mail-wi0-f174.google.com) (209.85.212.174) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Feb 2013 19:17:01 +0000 Received: by mail-wi0-f174.google.com with SMTP id hi8so3861732wib.7 for ; Mon, 25 Feb 2013 11:16:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=+viVaAzv3M7Nfro4DjKUGuk7gDBoCVOXHPwHX6wWW3Y=; b=At3rXgRfv3rbeg0iFVQeunA8KNAKvLs1PiIUO4vNLKJ8RdClGSpADSXijRMi1d7y2+ 2lszuddkOrpS5cI38FeuWsgJTjjDhQllugOsMgdlHLyfM8nhWKIFjxX4dCIcPUTZ7+BP 0alAHgiPlxzDAu3lts31UhKfSCsnP09R/8z9WvVtyVckAhP4HHpOFL/YxezKhqOn3eW1 ApqDHe7OED3AT+RXvQXx+RkalVWCQNm7+tK4fV6tnopfIJlash/1fTdyR14fxFBP4q51 U5685uhCqWEpSLhVKfUWV6wQvlp3/9v7OegCnnU46csZCOEc+NbRAtqZp3IJVxB58U2A fDdg== MIME-Version: 1.0 X-Received: by 10.180.92.129 with SMTP id cm1mr14566390wib.10.1361819801736; Mon, 25 Feb 2013 11:16:41 -0800 (PST) Received: by 10.216.16.4 with HTTP; Mon, 25 Feb 2013 11:16:41 -0800 (PST) In-Reply-To: References: Date: Mon, 25 Feb 2013 11:16:41 -0800 Message-ID: Subject: Re: Problem In Understanding Compaction Process From: Sergey Shelukhin To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=f46d043c7d1487d6f904d69161fa X-Gm-Message-State: ALoCoQn/ZfMN8bXEQ4JrN5UClRlk1V9ot88+pGpq+yqF0+gqkaSv6hmKaz0rwtdaBfdTkNwqj/db X-Virus-Checked: Checked by ClamAV on apache.org --f46d043c7d1487d6f904d69161fa Content-Type: text/plain; charset=ISO-8859-1 As for compaction file set update atomicity, I don't think it would currently be possible. It would require adding a separate feature; the first thing that comes to mind is storing the file set in some sort of a meta-file, and updating it atomically (as far as HDFS file replacement is atomic); then using that to load files. More importantly, files, as is, can contain multiple versions of the record, and can also contain delete records that invalidate previous updates. What is you scenario for analyzing them directly? On Fri, Feb 22, 2013 at 11:16 PM, Anty wrote: > Thanks Sergey > In my use case. I want to directly analyze the underlying HFiles, So i > can't tolerance duplicate data. > > Can you give me some pointers about how to make this procedure atomic? > > > > > > On Thu, Feb 21, 2013 at 6:07 AM, Sergey Shelukhin >wrote: > > > There should be no duplicate records despite the file not being deleted - > > between the records with exact same key/version/etc., the newer file > would > > be chosen by logical sequence. If that happens to be the same some choice > > (by time, or name), still one file will be chosen. > > Eventually, the file will be compacted again and disappear. Granted, by > > making the move atomic (via some meta/manifest file) we could avoid some > > overhead in this case at the cost of some added complexity, but it should > > be rather rare. > > > > On Tue, Feb 19, 2013 at 7:10 PM, Anty wrote: > > > > > Hi: Guys > > > > > > I have some problem in understanding the compaction process, Can > > > someone shed some light on me, much appreciate. Here is the problem: > > > > > > Region Server after successfully generate the final compacted > file, > > > it going through two steps: > > > 1. move the above compacted file into region's directory > > > 2. delete replaced files. > > > > > > the above two steps are not atomic, if Region Server crash after > > > step1, and before step2, then there are duplication records! Is this > > > problem handled in reading process , or there is another mechanism to > > fix > > > this? > > > > > > -- > > > Best Regards > > > Anty Rao > > > > > > > > > -- > Best Regards > Anty Rao > --f46d043c7d1487d6f904d69161fa--