hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hegner, Travis" <THeg...@trilliumit.com>
Subject RE: Indexed Table in Hbase
Date Mon, 17 Aug 2009 18:03:17 GMT
I'm not familiar with tableindexed at all, but my manually indexed tables have the value as
the row key, and a single column for each row of the original table that has that value.

The key user@domain.com would have columns rows:user1, rows:user7, rows:user12, etc.

Then just do a get on user@domain.com and you'll have a whole list of users with that email
address. The added benefit is that you can put some useful piece of info into any of the rows:user1
cells like whether the address is primary, or whatever fits your design.

Just a thought, perhaps you could implement that method with the tableindexed.IndexKeyGenerator
that Gary mentioned.

Thanks,

Travis Hegner
http://www.travishegner.com/


-----Original Message-----
From: bharath vissapragada [mailto:bharathvissapragada1990@gmail.com]
Sent: Monday, August 17, 2009 1:46 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Indexed Table in Hbase

Thanks for ur explanation Gary ,

Consider my case where i can have repetitions of values .. So u say that i
edit the IndexKeyGenerator in such a way that instead of storing
(column->rowkey) i should do in such a way that (coulmn-> rowkey1,rowkey2)
as diff timestamps ... if yes is that a good way ?

On Mon, Aug 17, 2009 at 10:53 PM, Gary Helmling <ghelmling@gmail.com> wrote:

> When defining the IndexSpecification for your table, you can pass your
> own implementation of
> org.apache.hadoop.hbase.client.tableindexed.IndexKeyGenerator.
>
> This allows you to control how the row keys are generated for the
> secondary index table.  For example, you could append the original
> table's row key to the indexed value to ensure uniqueness in
> referencing the original rows.
>
> When you create an indexed scanner, the secondary index code opens and
> wraps a scanner on the secondary index table, based on the start row
> you specify (the indexed value you're looking up).  It applies any
> filter passed to rows on the secondary index table, so make sure
> anything you want to filter on is listed in the "indexed columns" in
> your IndexSpecification.
>
> For any rows returned by the wrapped scanner, the client code then
> does a get for the original table record (the original row key is
> stored in the "__INDEX__" column family I think).
>
> So in total, when using secondary indexes, you wind up with 1 scan + N
> gets to look at N rows.
>
> At least, this was my understanding of how things worked as of 0.19.
> I'm actually moving indexing into my app layer as I update to 0.20.
>
> Hope this helps.
>
> --gh
>
>
> On Mon, Aug 17, 2009 at 1:00 PM, Jonathan Gray<jlist@streamy.com> wrote:
> > I'm actually unsure about that.  Look at the code or experiment.
> >
> > Seems to me that there would be a uniqueness requirement, otherwise what
> do
> > you expect the behavior to be?  A get can only return a single row, so
> > multiple index hits doesn't really make sense.
> >
> > Clint?  You out there? :)
> >
> > JG
> >
> > bharath vissapragada wrote:
> >>
> >> I got it ... I think this is definitely useful in my app because iam
> >> performing a full table scan everytime for selecting the rowkeys based
> on
> >> some column values .
> >>
> >> BUT ..
> >>
> >>  we can have more than one rowkey for the same column value .Can you
> >> please
> >> tell me how they are stored .
> >>
> >> Thanks in advance
> >>
> >> On Mon, Aug 17, 2009 at 9:27 PM, Jonathan Gray <jlist@streamy.com>
> wrote:
> >>
> >>> It's not an actual hash or btree index, but rather secondary indexes in
> >>> HBase are implemented by creating an additional HBase table.
> >>>
> >>> If I have a table "users" (row key is userid) with family "data" and
> >>> column
> >>> "email", and I want to index the value in that column...
> >>>
> >>> I can create a table "users_email" where the row key is the email
> address
> >>> (value from the column in "users" table) and a single column that
> >>> contains
> >>> the userid.
> >>>
> >>> Doing an "index lookup" would mean doing a get on "users_email" and
> then
> >>> using that userid to do a lookup on the "users" table.
> >>>
> >>> IndexedTable does this transparently, but still does require two
> queries.
> >>>  So it's slower than a single query, but certainly faster than a full
> >>> table
> >>> scan.
> >>>
> >>> If you need hash-level performance on the index lookup, there are lots
> of
> >>> solutions outside of HBase that would work... In-memory Java HashMap,
> >>> Tokyo
> >>> Cabinet on-disk HashMaps, BerkeleyDB, etc... If you need full-text
> >>> indexing,
> >>> you can use Lucene or the like.
> >>>
> >>> Make sense?
> >>>
> >>> JG
> >>>
> >>>
> >>> bharath vissapragada wrote:
> >>>
> >>>> But i have read somewhere that Secondary indexes are somewhat slow
> >>>> compared
> >>>> to normal Hbase tables ..Does that effect the performance ?
> >>>>
> >>>> Also do you know the type of index created on the column(i mean Hash
> >>>> type
> >>>> or
> >>>> Btree etc)
> >>>>
> >>>> On Mon, Aug 17, 2009 at 8:30 PM, Kirill Shabunov <e2k_1@yahoo.com>
> >>>> wrote:
> >>>>
> >>>>  Hi!
> >>>>>
> >>>>> As far as I understand you are talking about the secondary indexes.
> >>>>> Yes,
> >>>>> they can be used to quickly get the rowkey by a value in the indexed
> >>>>> column.
> >>>>>
> >>>>> --Kirill
> >>>>>
> >>>>>
> >>>>> bharath vissapragada wrote:
> >>>>>
> >>>>>  Hi all ,
> >>>>>>
> >>>>>> I have gone through the IndexedTableAdmin classes in Hbase 0.19.3
> API
> >>>>>> ..
> >>>>>>  I
> >>>>>> have seen some methods used to create an Indexed Table (on some
> >>>>>> column)..
> >>>>>> I
> >>>>>> have some doubts regarding the same ...
> >>>>>>
> >>>>>> 1) Are these somewhat similar to Hash indexes(in RDBMS) where
i can
> >>>>>> easily
> >>>>>> lookup a column value and find it's corresponding rowkey(s)
> >>>>>> 2) Can i find any performance gain when i use IndexedTable to
search
> >>>>>> for
> >>>>>> a
> >>>>>> paritcular column value .. instead of scanning an entire normal
> HTable
> >>>>>> ..
> >>>>>>
> >>>>>> Kindly clarify my doubts
> >>>>>>
> >>>>>> Thanks in advance
> >>>>>>
> >>>>>>
> >>>>>>
> >>
> >
>

The information contained in this communication is confidential and is intended only for the
use of the named recipient.  Unauthorized use, disclosure, or copying is strictly prohibited
and may be unlawful.  If you have received this communication in error, you should know that
you are bound to confidentiality, and should please immediately notify the sender or our IT
Department at  866.459.4599.

Mime
View raw message