hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: Best technique for doing lookup with Secondary Index
Date Fri, 26 Oct 2012 06:46:59 GMT
>
> Now your main question is lookups right
> Now there are some more hooks in the scan flow called pre/postScannerOpen,
> pre/postScannerNext.
> May be you can try using them to do a look up on the secondary table and
> then use those values and pass it to the main table next().
>

In secondary index its hard to avoid at-least two RPC calls(1 from client
to table B and then from table B to Table A) whether you use coproc or not.
But, i believe using coproc is better than doing RPC calls from client
since it might be outside the subnet/network of cluster. In this case, the
RPC will be faster when we use coprocs. In my case the client is certainly
not in the same subnet or network zone. I need to provide results of query
in around 100 milliseconds or less so i need to be really frugal. Let me
know your views on this.

Have you implemented queries with Secondary indexes using coproc yet?
At present i have tried the client side query and i can get the results of
query in around 100 ms. I am enticed to try out the coproc implementation.

But this may involve more RPC calls as your regions of "A" and "B" may be in
> different RS.
>
AFAIK, RPC cannot be avoided even if Region A and Region B are on same RS
since these two regions are from different table. Am i right?


Thanks,
Anil Gupta

On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan <
ramkrishna.vasudevan@huawei.com> wrote:

> > Is it a
> > good idea to create Htable instance on "B" and do put in my mapper? I
> > might
> > try this idea.
> Yes you can do this..  May be the same mapper you can do a put for table
> "B".  This was how we have tried loading data to another table by using the
> main table "A"
> Puts.
>
> Now your main question is lookups right
> Now there are some more hooks in the scan flow called pre/postScannerOpen,
> pre/postScannerNext.
> May be you can try using them to do a look up on the secondary table and
> then use those values and pass it to the main table next().
> But this may involve more RPC calls as your regions of "A" and "B" may be
> in
> different RS.
>
> If something is wrong in my understanding of what you said, kindly spare
> me.
> :)
>
> Regards
> Ram
>
>
> > -----Original Message-----
> > From: anil gupta [mailto:anilgupta84@gmail.com]
> > Sent: Friday, October 26, 2012 3:40 AM
> > To: user@hbase.apache.org
> > Subject: Re: Best technique for doing lookup with Secondary Index
> >
> > Anoop:  In prePut hook u call HTable#put()?
> > Anil: Yes i call HTable#put() in prePut. Is there better way of doing
> > it?
> >
> > Anoop: Why use the network calls from server side here then?
> > Anil: I thought this is a cleaner approach since i am using BulkLoader.
> > I
> > decided not to run two jobs since i am generating a UniqueIdentifier at
> > runtime in bulkloader.
> >
> > Anoop: can not handle it from client alone?
> > Anil: I cannot handle it from client since i am using BulkLoader. Is it
> > a
> > good idea to create Htable instance on "B" and do put in my mapper? I
> > might
> > try this idea.
> >
> > Anoop: You can have a look at Lily project.
> > Anil: It's little late for us to evaluate Lily now and at present we
> > dont
> > need complex secondary index since our data is immutable.
> >
> > Ram: what is rowkey B here?
> > Anil: Suppose i am storing customer events in table A. I have two
> > requirement for data query:
> > 1. Query customer events on basis of customer_Id and event_ID.
> > 2. Query customer events on basis of event_timestamp and customer_ID.
> >
> > 70% of querying is done by query#1, so i will create
> > <customer_Id><event_ID> as row key of Table A.
> > Now, in order to support fast results for query#2, i need to create a
> > secondary index on A. I store that secondary index in B, rowkey of B is
> > <event_timestamp><customer_ID>  .Every row stores the corresponding
> > rowkey
> > of A.
> >
> > Ram:How is the startRow determined for every query?
> > Anil: Its determined by a very simple application logic.
> >
> > Thanks,
> > Anil Gupta
> >
> > On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan <
> > ramkrishna.vasudevan@huawei.com> wrote:
> >
> > > Just out of curiosity,
> > > > The secondary index is stored in table "B" as rowkey B -->
> > > > family:<rowkey
> > > > A>
> > > what is rowkey B here?
> > > > 1. Scan the secondary table by using prefix filter and startRow.
> > > How is the startRow determined for every query ?
> > >
> > > Regards
> > > Ram
> > >
> > > > -----Original Message-----
> > > > From: Anoop Sam John [mailto:anoopsj@huawei.com]
> > > > Sent: Thursday, October 25, 2012 10:15 AM
> > > > To: user@hbase.apache.org
> > > > Subject: RE: Best technique for doing lookup with Secondary Index
> > > >
> > > > >I build the secondary table "B" using a prePut RegionObserver.
> > > >
> > > > Anil,
> > > >        In prePut hook u call HTable#put()?  Why use the network
> > calls
> > > > from server side here then? can not handle it from client alone?
> > You
> > > > can have a look at Lily project.   Thoughts after seeing ur idea on
> > put
> > > > and scan..
> > > >
> > > > -Anoop-
> > > > ________________________________________
> > > > From: anil gupta [anilgupta84@gmail.com]
> > > > Sent: Thursday, October 25, 2012 3:10 AM
> > > > To: user@hbase.apache.org
> > > > Subject: Best technique for doing lookup with Secondary Index
> > > >
> > > > Hi All,
> > > >
> > > > I am using HBase 0.92.1. I have created a secondary index on table
> > "A".
> > > > Table A stores immutable data. I build the secondary table "B"
> > using a
> > > > prePut RegionObserver.
> > > >
> > > > The secondary index is stored in table "B" as rowkey B -->
> > > > family:<rowkey
> > > > A>  . "<rowkey A>" is the column qualifier. Every row in B will
> > only on
> > > > have one column and the name of that column is the rowkey of A. So
> > the
> > > > value is blank. As per my understanding, accessing column qualifier
> > is
> > > > faster than accessing value. Please correct me if i am wrong.
> > > >
> > > >
> > > > HBase Querying approach:
> > > > 1. Scan the secondary table by using prefix filter and startRow.
> > > > 2. Do a batch get on primary table by using HTable.get(List<Get>)
> > > > method.
> > > >
> > > > The above approach for retrieval works fine but i was wondering it
> > > > there is
> > > > a better approach. I was planning to try out doing the retrieval
> > using
> > > > coprocessors.
> > > > Have anyone tried using coprocessors? I would appreciate if others
> > can
> > > > share their experience with secondary index for HBase queries.
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Anil Gupta=
> > >
> > >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
>
>


-- 
Thanks & Regards,
Anil Gupta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message