Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9EDB58DC for ; Wed, 29 Aug 2012 17:47:08 +0000 (UTC) Received: (qmail 67980 invoked by uid 500); 29 Aug 2012 17:47:07 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 67913 invoked by uid 500); 29 Aug 2012 17:47:07 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 67904 invoked by uid 99); 29 Aug 2012 17:47:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2012 17:47:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jon@cloudera.com designates 209.85.212.173 as permitted sender) Received: from [209.85.212.173] (HELO mail-wi0-f173.google.com) (209.85.212.173) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2012 17:47:00 +0000 Received: by wibhm6 with SMTP id hm6so5054346wib.2 for ; Wed, 29 Aug 2012 10:46:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=pyYHGxiLbIOMJi0c6QuVqk4daxVCLp+bc7ch/PbQIYs=; b=k3d83W/bOBpqHCd7o2a/uhieAs4ooQncmynq3+NqWmy7UV74rpjXlChS143gsiVJrP ipRKyHL/BvKKKRdNva+gYviFHq8EPgSwM/Amt+EDqZylE/8S7Vt6LF9+k3G1f8k+VQB2 sWU4X8FHTETNqY6I50YE0Z6V9PPvDRCmCp3N+5UVwAAg5FPDbkSBC5vW9mU3buG+jlVE 195PhtPxvOT6mEOiN4jt9zhMmFs+TQ8uUpQRtIPdz+SZjVfm7g2oJwk/cS2jk2A6Ppef tJk5ZPkfZJ+BstYA8GsKQKmSnoyZd4hrA0GEZsOnQXXFiEsoxeckslBXgFw7GQfvP8jM rQVA== Received: by 10.216.241.137 with SMTP id g9mr1161691wer.122.1346262400559; Wed, 29 Aug 2012 10:46:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.137.18 with HTTP; Wed, 29 Aug 2012 10:46:20 -0700 (PDT) In-Reply-To: <503e3199.0793b60a.1674.7c36SMTPIN_ADDED@mx.google.com> References: <503c7291.47df440a.791e.ffff957cSMTPIN_ADDED@mx.google.com> <503d9842.a77d440a.3c95.7079SMTPIN_ADDED@mx.google.com> <503e3199.0793b60a.1674.7c36SMTPIN_ADDED@mx.google.com> From: Jonathan Hsieh Date: Wed, 29 Aug 2012 10:46:20 -0700 Message-ID: Subject: Re: A general question on maxVersion handling when we have Secondary index tables To: dev@hbase.apache.org Content-Type: multipart/alternative; boundary=e0cb4e43d32b28e96c04c86b24f1 X-Gm-Message-State: ALoCoQnnwJQKUke6wWCaMNA8rCqw0PeeM8eFtfUYdfohBVue/ff4zI4cQU4rids9z6y4tgPC4i0I --e0cb4e43d32b28e96c04c86b24f1 Content-Type: text/plain; charset=ISO-8859-1 Let me rephrase to make sure I'm on the same page for the ram's question: We do three inserts on row 1 at different times to the same column (which is being indexed in a secondary table) (Are we assuming only a 1-to-1 secondary->primary mapping?) t1< t2 wrote: > When we have many to one mapping between main and secondary index table may > be we will end up in hitting many RS. If there is one to one mapping may be > that is not a problem. > > Basically my intention of this discussion was mainly to discuss on the > version maintenance on any type of secondary index particularly to remove > the stale data in the index table that would have expired. > > Regards > Ram > > > > -----Original Message----- > > From: Ted Yu [mailto:yuzhihong@gmail.com] > > Sent: Wednesday, August 29, 2012 7:45 PM > > To: dev@hbase.apache.org > > Subject: Re: A general question on maxVersion handling when we have > > Secondary index tables > > > > Thanks for the detailed response, Jon. > > > > bq. it would mean that a query based on secondary index would > > potentially have to hit every region server that has a region in the > > primary table. > > > > Can you elaborate on the above a little bit ? > > Is this because secondary index would point us to more than one region > > in > > the data table because several versions are saved for the same row ? > > > > My thinking was to ease management of simultaneous (data and index) > > region > > split through region colocation. > > > > Cheers > > > > On Wed, Aug 29, 2012 at 6:47 AM, Jonathan Hsieh > > wrote: > > > > > I'm more of a fan of having secondary indexes added as an external > > feature > > > (coproc or new client library on top of our current client library) > > and > > > focusing on only adding apis necessary to make 2ndary indexes > > possible and > > > correct on/in HBase. There are many different use patterns and > > > requirements and one style of secondary index will not be good for > > > everything. Do we only care about this working well for highly > > selectivity > > > keys? What are possible indexes (col name, value, value prefix, > > everything > > > our filters support?) Do we care more about writes or reads, ACID > > > correctness or speed, etc? Also, there are several questions about > > how we > > > handle other features in conjunction with 2ndary indexes: > > replication, bulk > > > load, snapshots, to name a few. > > > > > > Maybe it makes sense to spend some time defining what we want to > > index > > > secondarily and what a user api to this external api would be. Then > > we > > > could have the different implementations under-the-covers, and allow > > for > > > users to swap implementations for the tradeoffs that fit their use > > cases. > > > It wouldn't be free to change but hopefully "easy" from a user point > > of > > > view. > > > > > > Personally, I've tend to favor more of a percolator-style > > implementation -- > > > it is a client library and built on top of hbase. This approach seems > > to be > > > more "HBase-style" with it's emphasis consistency and atomicity, and > > seems > > > to require only a few mondifications to HBase core. Sure it likely > > slower > > > than my read of Jesse's proposal, but it seems always always > > consistent and > > > thus predictable in cases where there are failures on deletes and > > updates. > > > We'd need HBase API primitives like checkAndMutate call (check with > > > multiple delete/put on the same row), and possibly an atomic > > multitable > > > bulkload. I'm not sure that it is replication compatible, and there > > are > > > probably questions we'll need to answer once snapshots solidifies. > > > > > > Ted's idea of colocating regions (like the index table's > > > regions) definitely feels like a primitive (pluggable, likely-per- > > table > > > region assignment plans) that we could add to HBase core. This > > requirement > > > though for 2ndary indexes seems to imply an approach similar to > > cassandra's > > > approach -- having a local index of each region on region server and > > > colocating them. Is this right? If so, this is essentially a > > filtering > > > optimization -- it would mean that a query based on secondary index > > would > > > potentially have to hit every region server that has a region in the > > > primary table. This is great approach if the index lookup has high > > > cardinality but if the secondary index is highly selective, you'd > > have to > > > march through a bunch or RS's before getting an answer. > > > > > > Jon. > > > > > > On Tue, Aug 28, 2012 at 9:18 PM, Ramkrishna.S.Vasudevan < > > > ramkrishna.vasudevan@huawei.com> wrote: > > > > > > > Hi > > > > > > > > Yes I was talking about the dead entry in the index table rather > > than the > > > > actual data table. > > > > > > > > Regards > > > > Ram > > > > > > > > > -----Original Message----- > > > > > From: Wei Tan [mailto:wtan@us.ibm.com] > > > > > Sent: Tuesday, August 28, 2012 9:22 PM > > > > > To: dev@hbase.apache.org > > > > > Cc: Sandeep Tata > > > > > Subject: Re: A general question on maxVersion handling when we > > have > > > > > Secondary index tables > > > > > > > > > > Thanks for sharing a pointer to your implementation. > > > > > My two cents: > > > > > timestamp is a way to do MVCC and setting every KV with the same > > TS > > > > > will > > > > > get concurrency control very tricky and error prone, if not > > impossible > > > > > I think Ram is talking about the dead entry in the index table > > rather > > > > > than > > > > > data table. Deleting old index entries upfront when there is a > > new put > > > > > might be a choice. > > > > > > > > > > > > > > > Best Regards, > > > > > Wei > > > > > > > > > > Wei Tan > > > > > Research Staff Member > > > > > IBM T. J. Watson Research Center > > > > > 19 Skyline Dr, Hawthorne, NY 10532 > > > > > wtan@us.ibm.com; 914-784-6752 > > > > > > > > > > > > > > > > > > > > From: Jesse Yates > > > > > To: dev@hbase.apache.org, > > > > > Date: 08/28/2012 04:00 AM > > > > > Subject: Re: A general question on maxVersion handling > > when we > > > > > have > > > > > Secondary index tables > > > > > > > > > > > > > > > > > > > > Ram, > > > > > > > > > > If I understand correctly, I think you can design your index such > > that > > > > > you > > > > > don't actually use the timestamp (e.g. everything gets put with a > > TS = > > > > > 10 > > > > > - > > > > > or some other non-special, relatively small number that's not 0 > > as I'd > > > > > worry about that in HBase ;) Then when you set maxVersions to 1, > > > > > everything > > > > > should be good. > > > > > > > > > > You get a couple of wasted bytes from the TS, but with the > > prefixTrie > > > > > stuff > > > > > that should be pretty minimal overhead. If you do need to keep > > track of > > > > > the > > > > > timestamp you should be able to munge that back up into the > > column > > > > > qualifier (and just know that that last 64 bits is the > > timestamp). > > > > > Again a > > > > > little more CPU cost, but its really not that big of an overhead. > > It > > > > > seems > > > > > like you don't really care about the TS though, in which case > > this > > > > > should > > > > > be pretty simple. > > > > > > > > > > Out of curiosity, what are people using for their secondary > > indexing > > > > > solutions? I know there are a bunch out there, but don't know > > what > > > > > people > > > > > have adopted, what they like/dislike, design tradeoffs made and > > why. > > > > > > > > > > Disclaimer: I recently proposed a secondary indexing solution > > myself > > > > > (shameless self-plug: > > > > > http://jyates.github.com/2012/07/09/consistent-enough-secondary- > > > > > indexes.html > > > > > ) > > > > > and its something I'm working on for Salesforce - open sourced at > > some > > > > > point, promise! > > > > > > > > > > -Jesse > > > > > ------------------- > > > > > Jesse Yates > > > > > @jesse_yates > > > > > jyates.github.com > > > > > > > > > > > > > > > On Tue, Aug 28, 2012 at 12:24 AM, Ramkrishna.S.Vasudevan < > > > > > ramkrishna.vasudevan@huawei.com> wrote: > > > > > > > > > > > Hi All > > > > > > > > > > > > > > > > > > > > > > > > When we try to build any type of secondary indices for a given > > table > > > > > how > > > > > > can > > > > > > one handle maxVersions in the secondary index tables. > > > > > > > > > > > > > > > > > > > > > > > > For eg, > > > > > > > > > > > > I have inserted > > > > > > > > > > > > Row1 - Val1 => t > > > > > > > > > > > > Row1 - Val2 => t+1 > > > > > > > > > > > > Row1 - Val3. => t+2 > > > > > > > > > > > > > > > > > > > > > > > > Ideally if my max versions is only one then Val3 should be my > > result > > > > > If > > > > > I > > > > > > query on main table for row1. > > > > > > > > > > > > > > > > > > > > > > > > Now in my index I will be having all the above 3 entries. Now > > how > > > > > can > > > > > we > > > > > > remove the older entries from the index table that does not fit > > into > > > > > > maxVersions. > > > > > > > > > > > > > > > > > > > > > > > > Currently while scanning and the code that avoids the max > > Versions > > > > > does > > > > > not > > > > > > give any hooks to know the entries skipped thro versions. > > > > > > > > > > > > So any suggestions on this, I am still seeing the code for any > > other > > > > > > options > > > > > > but suggestions welcome. > > > > > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > > > > Ram > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > // Jonathan Hsieh (shay) > > > // Software Engineer, Cloudera > > > // jon@cloudera.com > > > > > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // jon@cloudera.com --e0cb4e43d32b28e96c04c86b24f1--