Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 20537 invoked from network); 17 Aug 2009 17:46:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Aug 2009 17:46:51 -0000 Received: (qmail 15719 invoked by uid 500); 17 Aug 2009 17:46:57 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 15693 invoked by uid 500); 17 Aug 2009 17:46:57 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 15683 invoked by uid 99); 17 Aug 2009 17:46:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Aug 2009 17:46:57 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bharathvissapragada1990@gmail.com designates 209.85.219.226 as permitted sender) Received: from [209.85.219.226] (HELO mail-ew0-f226.google.com) (209.85.219.226) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Aug 2009 17:46:49 +0000 Received: by ewy26 with SMTP id 26so3176414ewy.29 for ; Mon, 17 Aug 2009 10:46:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=EXWf/ljQ+jwVD/v0IU2SK1lun9jog6cKZHiVAnxFpnk=; b=btsYm6JPLlKPlS4T5t4HO9rgp1YAfQ9nq7giqYrNiENssRsX17GxQ4XkMn6yQ6QGDh XhIcogtdN9M5ndN0MPwYESg+5TqgtqJUeYGVf0ngJv7kMj8NY/U6kNY8xd7GsyNqegaW 1Aqz/iydexYVWUDtwxHaiIB0wkDeu7BMwNED0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=czTMWyjJI0Kik9g4zddFM3JotJJYgchZhPqu0pl7yOBmCE1bvddfEYkON0Y4hi4z9f 6Vvyr7qP/csDpvquAlhR8VJBsgyZGr4vJ5opWQlfUB6VqqO3E4jr0c5QBhtmqlk+a2h4 rh9QnZ1KMCz/bKbOlQJR+5U/ztuiVqSg++mtE= MIME-Version: 1.0 Received: by 10.210.51.10 with SMTP id y10mr5727140eby.97.1250531187117; Mon, 17 Aug 2009 10:46:27 -0700 (PDT) In-Reply-To: References: <73d592f60908170708w35802725q11c4043d8fc05da1@mail.gmail.com> <4A89709E.5030900@yahoo.com> <73d592f60908170826ub3eac4bxe7372ed72d0334b2@mail.gmail.com> <4A897E07.5090604@streamy.com> <73d592f60908170957o61f804erdfdc4cb60d4e6657@mail.gmail.com> <4A898C90.3040406@streamy.com> From: bharath vissapragada Date: Mon, 17 Aug 2009 23:16:07 +0530 Message-ID: <73d592f60908171046w645221ccga1503cabcdb81aef@mail.gmail.com> Subject: Re: Indexed Table in Hbase To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0015174bedce30517c047159fbbb X-Virus-Checked: Checked by ClamAV on apache.org --0015174bedce30517c047159fbbb Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Thanks for ur explanation Gary , Consider my case where i can have repetitions of values .. So u say that i edit the IndexKeyGenerator in such a way that instead of storing (column->rowkey) i should do in such a way that (coulmn-> rowkey1,rowkey2) as diff timestamps ... if yes is that a good way ? On Mon, Aug 17, 2009 at 10:53 PM, Gary Helmling wrote: > When defining the IndexSpecification for your table, you can pass your > own implementation of > org.apache.hadoop.hbase.client.tableindexed.IndexKeyGenerator. > > This allows you to control how the row keys are generated for the > secondary index table. For example, you could append the original > table's row key to the indexed value to ensure uniqueness in > referencing the original rows. > > When you create an indexed scanner, the secondary index code opens and > wraps a scanner on the secondary index table, based on the start row > you specify (the indexed value you're looking up). It applies any > filter passed to rows on the secondary index table, so make sure > anything you want to filter on is listed in the "indexed columns" in > your IndexSpecification. > > For any rows returned by the wrapped scanner, the client code then > does a get for the original table record (the original row key is > stored in the "__INDEX__" column family I think). > > So in total, when using secondary indexes, you wind up with 1 scan + N > gets to look at N rows. > > At least, this was my understanding of how things worked as of 0.19. > I'm actually moving indexing into my app layer as I update to 0.20. > > Hope this helps. > > --gh > > > On Mon, Aug 17, 2009 at 1:00 PM, Jonathan Gray wrote: > > I'm actually unsure about that. Look at the code or experiment. > > > > Seems to me that there would be a uniqueness requirement, otherwise what > do > > you expect the behavior to be? A get can only return a single row, so > > multiple index hits doesn't really make sense. > > > > Clint? You out there? :) > > > > JG > > > > bharath vissapragada wrote: > >> > >> I got it ... I think this is definitely useful in my app because iam > >> performing a full table scan everytime for selecting the rowkeys based > on > >> some column values . > >> > >> BUT .. > >> > >> we can have more than one rowkey for the same column value .Can you > >> please > >> tell me how they are stored . > >> > >> Thanks in advance > >> > >> On Mon, Aug 17, 2009 at 9:27 PM, Jonathan Gray > wrote: > >> > >>> It's not an actual hash or btree index, but rather secondary indexes in > >>> HBase are implemented by creating an additional HBase table. > >>> > >>> If I have a table "users" (row key is userid) with family "data" and > >>> column > >>> "email", and I want to index the value in that column... > >>> > >>> I can create a table "users_email" where the row key is the email > address > >>> (value from the column in "users" table) and a single column that > >>> contains > >>> the userid. > >>> > >>> Doing an "index lookup" would mean doing a get on "users_email" and > then > >>> using that userid to do a lookup on the "users" table. > >>> > >>> IndexedTable does this transparently, but still does require two > queries. > >>> So it's slower than a single query, but certainly faster than a full > >>> table > >>> scan. > >>> > >>> If you need hash-level performance on the index lookup, there are lots > of > >>> solutions outside of HBase that would work... In-memory Java HashMap, > >>> Tokyo > >>> Cabinet on-disk HashMaps, BerkeleyDB, etc... If you need full-text > >>> indexing, > >>> you can use Lucene or the like. > >>> > >>> Make sense? > >>> > >>> JG > >>> > >>> > >>> bharath vissapragada wrote: > >>> > >>>> But i have read somewhere that Secondary indexes are somewhat slow > >>>> compared > >>>> to normal Hbase tables ..Does that effect the performance ? > >>>> > >>>> Also do you know the type of index created on the column(i mean Hash > >>>> type > >>>> or > >>>> Btree etc) > >>>> > >>>> On Mon, Aug 17, 2009 at 8:30 PM, Kirill Shabunov > >>>> wrote: > >>>> > >>>> Hi! > >>>>> > >>>>> As far as I understand you are talking about the secondary indexes. > >>>>> Yes, > >>>>> they can be used to quickly get the rowkey by a value in the indexed > >>>>> column. > >>>>> > >>>>> --Kirill > >>>>> > >>>>> > >>>>> bharath vissapragada wrote: > >>>>> > >>>>> Hi all , > >>>>>> > >>>>>> I have gone through the IndexedTableAdmin classes in Hbase 0.19.3 > API > >>>>>> .. > >>>>>> I > >>>>>> have seen some methods used to create an Indexed Table (on some > >>>>>> column).. > >>>>>> I > >>>>>> have some doubts regarding the same ... > >>>>>> > >>>>>> 1) Are these somewhat similar to Hash indexes(in RDBMS) where i can > >>>>>> easily > >>>>>> lookup a column value and find it's corresponding rowkey(s) > >>>>>> 2) Can i find any performance gain when i use IndexedTable to search > >>>>>> for > >>>>>> a > >>>>>> paritcular column value .. instead of scanning an entire normal > HTable > >>>>>> .. > >>>>>> > >>>>>> Kindly clarify my doubts > >>>>>> > >>>>>> Thanks in advance > >>>>>> > >>>>>> > >>>>>> > >> > > > --0015174bedce30517c047159fbbb--