hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bharath vissapragada <bharathvissapragada1...@gmail.com>
Subject Re: Indexed Table in Hbase
Date Mon, 17 Aug 2009 17:46:07 GMT
Thanks for ur explanation Gary ,

Consider my case where i can have repetitions of values .. So u say that i
edit the IndexKeyGenerator in such a way that instead of storing
(column->rowkey) i should do in such a way that (coulmn-> rowkey1,rowkey2)
as diff timestamps ... if yes is that a good way ?

On Mon, Aug 17, 2009 at 10:53 PM, Gary Helmling <ghelmling@gmail.com> wrote:

> When defining the IndexSpecification for your table, you can pass your
> own implementation of
> org.apache.hadoop.hbase.client.tableindexed.IndexKeyGenerator.
>
> This allows you to control how the row keys are generated for the
> secondary index table.  For example, you could append the original
> table's row key to the indexed value to ensure uniqueness in
> referencing the original rows.
>
> When you create an indexed scanner, the secondary index code opens and
> wraps a scanner on the secondary index table, based on the start row
> you specify (the indexed value you're looking up).  It applies any
> filter passed to rows on the secondary index table, so make sure
> anything you want to filter on is listed in the "indexed columns" in
> your IndexSpecification.
>
> For any rows returned by the wrapped scanner, the client code then
> does a get for the original table record (the original row key is
> stored in the "__INDEX__" column family I think).
>
> So in total, when using secondary indexes, you wind up with 1 scan + N
> gets to look at N rows.
>
> At least, this was my understanding of how things worked as of 0.19.
> I'm actually moving indexing into my app layer as I update to 0.20.
>
> Hope this helps.
>
> --gh
>
>
> On Mon, Aug 17, 2009 at 1:00 PM, Jonathan Gray<jlist@streamy.com> wrote:
> > I'm actually unsure about that.  Look at the code or experiment.
> >
> > Seems to me that there would be a uniqueness requirement, otherwise what
> do
> > you expect the behavior to be?  A get can only return a single row, so
> > multiple index hits doesn't really make sense.
> >
> > Clint?  You out there? :)
> >
> > JG
> >
> > bharath vissapragada wrote:
> >>
> >> I got it ... I think this is definitely useful in my app because iam
> >> performing a full table scan everytime for selecting the rowkeys based
> on
> >> some column values .
> >>
> >> BUT ..
> >>
> >>  we can have more than one rowkey for the same column value .Can you
> >> please
> >> tell me how they are stored .
> >>
> >> Thanks in advance
> >>
> >> On Mon, Aug 17, 2009 at 9:27 PM, Jonathan Gray <jlist@streamy.com>
> wrote:
> >>
> >>> It's not an actual hash or btree index, but rather secondary indexes in
> >>> HBase are implemented by creating an additional HBase table.
> >>>
> >>> If I have a table "users" (row key is userid) with family "data" and
> >>> column
> >>> "email", and I want to index the value in that column...
> >>>
> >>> I can create a table "users_email" where the row key is the email
> address
> >>> (value from the column in "users" table) and a single column that
> >>> contains
> >>> the userid.
> >>>
> >>> Doing an "index lookup" would mean doing a get on "users_email" and
> then
> >>> using that userid to do a lookup on the "users" table.
> >>>
> >>> IndexedTable does this transparently, but still does require two
> queries.
> >>>  So it's slower than a single query, but certainly faster than a full
> >>> table
> >>> scan.
> >>>
> >>> If you need hash-level performance on the index lookup, there are lots
> of
> >>> solutions outside of HBase that would work... In-memory Java HashMap,
> >>> Tokyo
> >>> Cabinet on-disk HashMaps, BerkeleyDB, etc... If you need full-text
> >>> indexing,
> >>> you can use Lucene or the like.
> >>>
> >>> Make sense?
> >>>
> >>> JG
> >>>
> >>>
> >>> bharath vissapragada wrote:
> >>>
> >>>> But i have read somewhere that Secondary indexes are somewhat slow
> >>>> compared
> >>>> to normal Hbase tables ..Does that effect the performance ?
> >>>>
> >>>> Also do you know the type of index created on the column(i mean Hash
> >>>> type
> >>>> or
> >>>> Btree etc)
> >>>>
> >>>> On Mon, Aug 17, 2009 at 8:30 PM, Kirill Shabunov <e2k_1@yahoo.com>
> >>>> wrote:
> >>>>
> >>>>  Hi!
> >>>>>
> >>>>> As far as I understand you are talking about the secondary indexes.
> >>>>> Yes,
> >>>>> they can be used to quickly get the rowkey by a value in the indexed
> >>>>> column.
> >>>>>
> >>>>> --Kirill
> >>>>>
> >>>>>
> >>>>> bharath vissapragada wrote:
> >>>>>
> >>>>>  Hi all ,
> >>>>>>
> >>>>>> I have gone through the IndexedTableAdmin classes in Hbase 0.19.3
> API
> >>>>>> ..
> >>>>>>  I
> >>>>>> have seen some methods used to create an Indexed Table (on some
> >>>>>> column)..
> >>>>>> I
> >>>>>> have some doubts regarding the same ...
> >>>>>>
> >>>>>> 1) Are these somewhat similar to Hash indexes(in RDBMS) where
i can
> >>>>>> easily
> >>>>>> lookup a column value and find it's corresponding rowkey(s)
> >>>>>> 2) Can i find any performance gain when i use IndexedTable to
search
> >>>>>> for
> >>>>>> a
> >>>>>> paritcular column value .. instead of scanning an entire normal
> HTable
> >>>>>> ..
> >>>>>>
> >>>>>> Kindly clarify my doubts
> >>>>>>
> >>>>>> Thanks in advance
> >>>>>>
> >>>>>>
> >>>>>>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message