Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 64E9ED9E4 for ; Tue, 28 Aug 2012 08:52:56 +0000 (UTC) Received: (qmail 27488 invoked by uid 500); 28 Aug 2012 08:52:55 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 27231 invoked by uid 500); 28 Aug 2012 08:52:55 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 27182 invoked by uid 99); 28 Aug 2012 08:52:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Aug 2012 08:52:53 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=MSGID_MULTIPLE_AT,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ramkrishna.vasudevan@huawei.com designates 119.145.14.65 as permitted sender) Received: from [119.145.14.65] (HELO szxga02-in.huawei.com) (119.145.14.65) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Aug 2012 08:52:49 +0000 Received: from 172.24.2.119 (EHLO szxeml213-edg.china.huawei.com) ([172.24.2.119]) by szxrg02-dlp.huawei.com (MOS 4.3.4-GA FastPath queued) with ESMTP id AOD37044; Tue, 28 Aug 2012 16:52:26 +0800 (CST) Received: from SZXEML422-HUB.china.huawei.com (10.82.67.161) by szxeml213-edg.china.huawei.com (172.24.2.30) with Microsoft SMTP Server (TLS) id 14.1.323.3; Tue, 28 Aug 2012 16:51:08 +0800 Received: from blrprnc05ns (10.18.96.94) by szxeml422-hub.china.huawei.com (10.82.67.161) with Microsoft SMTP Server id 14.1.323.3; Tue, 28 Aug 2012 16:51:04 +0800 From: "Ramkrishna.S.Vasudevan" To: References: <503c7291.47df440a.791e.ffff957cSMTPIN_ADDED@mx.google.com> In-Reply-To: Subject: RE: A general question on maxVersion handling when we have Secondary index tables Date: Tue, 28 Aug 2012 14:21:03 +0530 Message-ID: <014201cd84fa$420c3880$c624a980$@vasudevan@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Ac2E80teZ2QO4xKETH6wmtE99lJ9VgABOUyw Content-Language: en-us x-cr-hashedpuzzle: AP1O AXir BR6L DH9G DX5e EfBk Etuh E/7C GAj8 Geqb Gs5f HHwD IXkM Jpcn KNW6 KtIe;1;ZABlAHYAQABoAGIAYQBzAGUALgBhAHAAYQBjAGgAZQAuAG8AcgBnAA==;Sosha1_v1;7;{AFA86CFA-B521-4825-B53E-670B2CD9C75B};cgBhAG0AawByAGkAcwBoAG4AYQAuAHYAYQBzAHUAZABlAHYAYQBuAEAAaAB1AGEAdwBlAGkALgBjAG8AbQA=;Tue, 28 Aug 2012 08:51:00 GMT;UgBFADoAIABBACAAZwBlAG4AZQByAGEAbAAgAHEAdQBlAHMAdABpAG8AbgAgAG8AbgAgAG0AYQB4AFYAZQByAHMAaQBvAG4AIABoAGEAbgBkAGwAaQBuAGcAIAB3AGgAZQBuACAAdwBlACAAaABhAHYAZQAgAFMAZQBjAG8AbgBkAGEAcgB5ACAAaQBuAGQAZQB4ACAAdABhAGIAbABlAHMA x-cr-puzzleid: {AFA86CFA-B521-4825-B53E-670B2CD9C75B} X-Originating-IP: [10.18.96.94] X-CFilter-Loop: Reflected X-Virus-Checked: Checked by ClamAV on apache.org Hi Jesse Thanks lot for your reply. -> Not maintaining timestamps in the sec index may cause problems when I issue an delete on the main table and the corresponding things needs to be deleted in the sec index. -> As in the case I mentioned below Index table will have Val1_row1 (t) Val2_row1 (t+2) Val3_row1 (t+3) Now my query says get me all the values greater than Val1 ideally only Val3 should be fetched. But may be a direct scan on index table will not know he should give me Val3_row3 alone. Unless I know the number of existing entries I will not be able to take a call as which one should be avoided and which one to be considered. Any way on the main table for row1 only Val3 will be retrieved. -> If I have a usecase like I will try to remove the older versions during compaction of the index table how can we do it? Having all the older versions also may lead to increase in the no of files and they may be compacted. But if I want to remove such olderversions during compaction what can be the ways we can handle. These are some problems that come to my mind while we want to impl this. Jesse, am I missing something here. The prefixTrie stuff comes when we are bothered about storage, yes using the prefixTrie stuff will help in storage. And talking about the usage of sec index may be I cannot comment on that now. Regards Ram > -----Original Message----- > From: Jesse Yates [mailto:jesse.k.yates@gmail.com] > Sent: Tuesday, August 28, 2012 1:30 PM > To: dev@hbase.apache.org > Subject: Re: A general question on maxVersion handling when we have > Secondary index tables > > Ram, > > If I understand correctly, I think you can design your index such that > you > don't actually use the timestamp (e.g. everything gets put with a TS = > 10 - > or some other non-special, relatively small number that's not 0 as I'd > worry about that in HBase ;) Then when you set maxVersions to 1, > everything > should be good. > > You get a couple of wasted bytes from the TS, but with the prefixTrie > stuff > that should be pretty minimal overhead. If you do need to keep track of > the > timestamp you should be able to munge that back up into the column > qualifier (and just know that that last 64 bits is the timestamp). > Again a > little more CPU cost, but its really not that big of an overhead. It > seems > like you don't really care about the TS though, in which case this > should > be pretty simple. > > Out of curiosity, what are people using for their secondary indexing > solutions? I know there are a bunch out there, but don't know what > people > have adopted, what they like/dislike, design tradeoffs made and why. > > Disclaimer: I recently proposed a secondary indexing solution myself > (shameless self-plug: > http://jyates.github.com/2012/07/09/consistent-enough-secondary- > indexes.html) > and its something I'm working on for Salesforce - open sourced at some > point, promise! > > -Jesse > ------------------- > Jesse Yates > @jesse_yates > jyates.github.com > > > On Tue, Aug 28, 2012 at 12:24 AM, Ramkrishna.S.Vasudevan < > ramkrishna.vasudevan@huawei.com> wrote: > > > Hi All > > > > > > > > When we try to build any type of secondary indices for a given table > how > > can > > one handle maxVersions in the secondary index tables. > > > > > > > > For eg, > > > > I have inserted > > > > Row1 - Val1 => t > > > > Row1 - Val2 => t+1 > > > > Row1 - Val3. => t+2 > > > > > > > > Ideally if my max versions is only one then Val3 should be my result > If I > > query on main table for row1. > > > > > > > > Now in my index I will be having all the above 3 entries. Now how > can we > > remove the older entries from the index table that does not fit into > > maxVersions. > > > > > > > > Currently while scanning and the code that avoids the max Versions > does not > > give any hooks to know the entries skipped thro versions. > > > > So any suggestions on this, I am still seeing the code for any other > > options > > but suggestions welcome. > > > > > > > > Regards > > > > Ram > > > >