Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of
 bharathvissapragada1990@gmail.com designates 209.85.219.226 as permitted
 sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=czTMWyjJI0Kik9g4zddFM3JotJJYgchZhPqu0pl7yOBmCE1bvddfEYkON0Y4hi4z9f
         6Vvyr7qP/csDpvquAlhR8VJBsgyZGr4vJ5opWQlfUB6VqqO3E4jr0c5QBhtmqlk+a2h4
         rh9QnZ1KMCz/bKbOlQJR+5U/ztuiVqSg++mtE=
MIME-Version: 1.0
In-Reply-To: <e6aa01cc0908171023k732a8ab9u376449772381ee13@mail.gmail.com>
References: <73d592f60908170708w35802725q11c4043d8fc05da1@mail.gmail.com>
	<4A89709E.5030900@yahoo.com>
 <73d592f60908170826ub3eac4bxe7372ed72d0334b2@mail.gmail.com>
	<4A897E07.5090604@streamy.com>
 <73d592f60908170957o61f804erdfdc4cb60d4e6657@mail.gmail.com>
	<4A898C90.3040406@streamy.com>
 <e6aa01cc0908171023k732a8ab9u376449772381ee13@mail.gmail.com>
From: bharath vissapragada <bharathvissapragada1990@gmail.com>
Date: Mon, 17 Aug 2009 23:16:07 +0530
Message-ID: <73d592f60908171046w645221ccga1503cabcdb81aef@mail.gmail.com>
Subject: Re: Indexed Table in Hbase
To: hbase-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0015174bedce30517c047159fbbb

--0015174bedce30517c047159fbbb
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Thanks for ur explanation Gary ,

Consider my case where i can have repetitions of values .. So u say that i
edit the IndexKeyGenerator in such a way that instead of storing
(column->rowkey) i should do in such a way that (coulmn-> rowkey1,rowkey2)
as diff timestamps ... if yes is that a good way ?

On Mon, Aug 17, 2009 at 10:53 PM, Gary Helmling <ghelmling@gmail.com> wrote:

> When defining the IndexSpecification for your table, you can pass your
> own implementation of
> org.apache.hadoop.hbase.client.tableindexed.IndexKeyGenerator.
>
> This allows you to control how the row keys are generated for the
> secondary index table.  For example, you could append the original
> table's row key to the indexed value to ensure uniqueness in
> referencing the original rows.
>
> When you create an indexed scanner, the secondary index code opens and
> wraps a scanner on the secondary index table, based on the start row
> you specify (the indexed value you're looking up).  It applies any
> filter passed to rows on the secondary index table, so make sure
> anything you want to filter on is listed in the "indexed columns" in
> your IndexSpecification.
>
> For any rows returned by the wrapped scanner, the client code then
> does a get for the original table record (the original row key is
> stored in the "__INDEX__" column family I think).
>
> So in total, when using secondary indexes, you wind up with 1 scan + N
> gets to look at N rows.
>
> At least, this was my understanding of how things worked as of 0.19.
> I'm actually moving indexing into my app layer as I update to 0.20.
>
> Hope this helps.
>
> --gh
>
>
> On Mon, Aug 17, 2009 at 1:00 PM, Jonathan Gray<jlist@streamy.com> wrote:
> > I'm actually unsure about that.  Look at the code or experiment.
> >
> > Seems to me that there would be a uniqueness requirement, otherwise what
> do
> > you expect the behavior to be?  A get can only return a single row, so
> > multiple index hits doesn't really make sense.
> >
> > Clint?  You out there? :)
> >
> > JG
> >
> > bharath vissapragada wrote:
> >>
> >> I got it ... I think this is definitely useful in my app because iam
> >> performing a full table scan everytime for selecting the rowkeys based
> on
> >> some column values .
> >>
> >> BUT ..
> >>
> >>  we can have more than one rowkey for the same column value .Can you
> >> please
> >> tell me how they are stored .
> >>
> >> Thanks in advance
> >>
> >> On Mon, Aug 17, 2009 at 9:27 PM, Jonathan Gray <jlist@streamy.com>
> wrote:
> >>
> >>> It's not an actual hash or btree index, but rather secondary indexes in
> >>> HBase are implemented by creating an additional HBase table.
> >>>
> >>> If I have a table "users" (row key is userid) with family "data" and
> >>> column
> >>> "email", and I want to index the value in that column...
> >>>
> >>> I can create a table "users_email" where the row key is the email
> address
> >>> (value from the column in "users" table) and a single column that
> >>> contains
> >>> the userid.
> >>>
> >>> Doing an "index lookup" would mean doing a get on "users_email" and
> then
> >>> using that userid to do a lookup on the "users" table.
> >>>
> >>> IndexedTable does this transparently, but still does require two
> queries.
> >>>  So it's slower than a single query, but certainly faster than a full
> >>> table
> >>> scan.
> >>>
> >>> If you need hash-level performance on the index lookup, there are lots
> of
> >>> solutions outside of HBase that would work... In-memory Java HashMap,
> >>> Tokyo
> >>> Cabinet on-disk HashMaps, BerkeleyDB, etc... If you need full-text
> >>> indexing,
> >>> you can use Lucene or the like.
> >>>
> >>> Make sense?
> >>>
> >>> JG
> >>>
> >>>
> >>> bharath vissapragada wrote:
> >>>
> >>>> But i have read somewhere that Secondary indexes are somewhat slow
> >>>> compared
> >>>> to normal Hbase tables ..Does that effect the performance ?
> >>>>
> >>>> Also do you know the type of index created on the column(i mean Hash
> >>>> type
> >>>> or
> >>>> Btree etc)
> >>>>
> >>>> On Mon, Aug 17, 2009 at 8:30 PM, Kirill Shabunov <e2k_1@yahoo.com>
> >>>> wrote:
> >>>>
> >>>>  Hi!
> >>>>>
> >>>>> As far as I understand you are talking about the secondary indexes.
> >>>>> Yes,
> >>>>> they can be used to quickly get the rowkey by a value in the indexed
> >>>>> column.
> >>>>>
> >>>>> --Kirill
> >>>>>
> >>>>>
> >>>>> bharath vissapragada wrote:
> >>>>>
> >>>>>  Hi all ,
> >>>>>>
> >>>>>> I have gone through the IndexedTableAdmin classes in Hbase 0.19.3
> API
> >>>>>> ..
> >>>>>>  I
> >>>>>> have seen some methods used to create an Indexed Table (on some
> >>>>>> column)..
> >>>>>> I
> >>>>>> have some doubts regarding the same ...
> >>>>>>
> >>>>>> 1) Are these somewhat similar to Hash indexes(in RDBMS) where i can
> >>>>>> easily
> >>>>>> lookup a column value and find it's corresponding rowkey(s)
> >>>>>> 2) Can i find any performance gain when i use IndexedTable to search
> >>>>>> for
> >>>>>> a
> >>>>>> paritcular column value .. instead of scanning an entire normal
> HTable
> >>>>>> ..
> >>>>>>
> >>>>>> Kindly clarify my doubts
> >>>>>>
> >>>>>> Thanks in advance
> >>>>>>
> >>>>>>
> >>>>>>
> >>
> >
>

--0015174bedce30517c047159fbbb--