hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Lam <chiling...@gmail.com>
Subject Re: Best technique for doing lookup with Secondary Index
Date Fri, 26 Oct 2012 14:29:07 GMT
Can we enforce 2 regions to collocate together as a logical group?

On Fri, Oct 26, 2012 at 6:14 AM, fding hbase <fding.hbase@gmail.com> wrote:

> https://github.com/danix800/hbase-indexed
>
> On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan <
> ramkrishna.vasudevan@huawei.com> wrote:
>
> > > AFAIK, RPC cannot be avoided even if Region A and Region B are on same
> > > RS
> > > since these two regions are from different table. Am i right?
> >
> > No... suppose your Region A and Region B of different tables are
> collocated
> > on same RS then from the coprocessor environment variable you can get
> > access
> > to the RS.
> > From RS you can get the online regions and from that region object you
> can
> > call puts or gets.  This will not involve any RPC with in that RS because
> > we
> > only deal with Region objects.
> >
> > Regards
> > Ram
> >
> > > -----Original Message-----
> > > From: anil gupta [mailto:anilgupta84@gmail.com]
> > > Sent: Friday, October 26, 2012 12:17 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: Best technique for doing lookup with Secondary Index
> > >
> > > >
> > > > Now your main question is lookups right
> > > > Now there are some more hooks in the scan flow called
> > > pre/postScannerOpen,
> > > > pre/postScannerNext.
> > > > May be you can try using them to do a look up on the secondary table
> > > and
> > > > then use those values and pass it to the main table next().
> > > >
> > >
> > > In secondary index its hard to avoid at-least two RPC calls(1 from
> > > client
> > > to table B and then from table B to Table A) whether you use coproc or
> > > not.
> > > But, i believe using coproc is better than doing RPC calls from client
> > > since it might be outside the subnet/network of cluster. In this case,
> > > the
> > > RPC will be faster when we use coprocs. In my case the client is
> > > certainly
> > > not in the same subnet or network zone. I need to provide results of
> > > query
> > > in around 100 milliseconds or less so i need to be really frugal. Let
> > > me
> > > know your views on this.
> > >
> > > Have you implemented queries with Secondary indexes using coproc yet?
> > > At present i have tried the client side query and i can get the results
> > > of
> > > query in around 100 ms. I am enticed to try out the coproc
> > > implementation.
> > >
> > > But this may involve more RPC calls as your regions of "A" and "B" may
> > > be in
> > > > different RS.
> > > >
> > > AFAIK, RPC cannot be avoided even if Region A and Region B are on same
> > > RS
> > > since these two regions are from different table. Am i right?
> > >
> > >
> > > Thanks,
> > > Anil Gupta
> > >
> > > On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan <
> > > ramkrishna.vasudevan@huawei.com> wrote:
> > >
> > > > > Is it a
> > > > > good idea to create Htable instance on "B" and do put in my mapper?
> > > I
> > > > > might
> > > > > try this idea.
> > > > Yes you can do this..  May be the same mapper you can do a put for
> > > table
> > > > "B".  This was how we have tried loading data to another table by
> > > using the
> > > > main table "A"
> > > > Puts.
> > > >
> > > > Now your main question is lookups right
> > > > Now there are some more hooks in the scan flow called
> > > pre/postScannerOpen,
> > > > pre/postScannerNext.
> > > > May be you can try using them to do a look up on the secondary table
> > > and
> > > > then use those values and pass it to the main table next().
> > > > But this may involve more RPC calls as your regions of "A" and "B"
> > > may be
> > > > in
> > > > different RS.
> > > >
> > > > If something is wrong in my understanding of what you said, kindly
> > > spare
> > > > me.
> > > > :)
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: anil gupta [mailto:anilgupta84@gmail.com]
> > > > > Sent: Friday, October 26, 2012 3:40 AM
> > > > > To: user@hbase.apache.org
> > > > > Subject: Re: Best technique for doing lookup with Secondary Index
> > > > >
> > > > > Anoop:  In prePut hook u call HTable#put()?
> > > > > Anil: Yes i call HTable#put() in prePut. Is there better way of
> > > doing
> > > > > it?
> > > > >
> > > > > Anoop: Why use the network calls from server side here then?
> > > > > Anil: I thought this is a cleaner approach since i am using
> > > BulkLoader.
> > > > > I
> > > > > decided not to run two jobs since i am generating a
> > > UniqueIdentifier at
> > > > > runtime in bulkloader.
> > > > >
> > > > > Anoop: can not handle it from client alone?
> > > > > Anil: I cannot handle it from client since i am using BulkLoader.
> > > Is it
> > > > > a
> > > > > good idea to create Htable instance on "B" and do put in my mapper?
> > > I
> > > > > might
> > > > > try this idea.
> > > > >
> > > > > Anoop: You can have a look at Lily project.
> > > > > Anil: It's little late for us to evaluate Lily now and at present
> > > we
> > > > > dont
> > > > > need complex secondary index since our data is immutable.
> > > > >
> > > > > Ram: what is rowkey B here?
> > > > > Anil: Suppose i am storing customer events in table A. I have two
> > > > > requirement for data query:
> > > > > 1. Query customer events on basis of customer_Id and event_ID.
> > > > > 2. Query customer events on basis of event_timestamp and
> > > customer_ID.
> > > > >
> > > > > 70% of querying is done by query#1, so i will create
> > > > > <customer_Id><event_ID> as row key of Table A.
> > > > > Now, in order to support fast results for query#2, i need to create
> > > a
> > > > > secondary index on A. I store that secondary index in B, rowkey of
> > > B is
> > > > > <event_timestamp><customer_ID>  .Every row stores the
corresponding
> > > > > rowkey
> > > > > of A.
> > > > >
> > > > > Ram:How is the startRow determined for every query?
> > > > > Anil: Its determined by a very simple application logic.
> > > > >
> > > > > Thanks,
> > > > > Anil Gupta
> > > > >
> > > > > On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan <
> > > > > ramkrishna.vasudevan@huawei.com> wrote:
> > > > >
> > > > > > Just out of curiosity,
> > > > > > > The secondary index is stored in table "B" as rowkey B
-->
> > > > > > > family:<rowkey
> > > > > > > A>
> > > > > > what is rowkey B here?
> > > > > > > 1. Scan the secondary table by using prefix filter and
> > > startRow.
> > > > > > How is the startRow determined for every query ?
> > > > > >
> > > > > > Regards
> > > > > > Ram
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Anoop Sam John [mailto:anoopsj@huawei.com]
> > > > > > > Sent: Thursday, October 25, 2012 10:15 AM
> > > > > > > To: user@hbase.apache.org
> > > > > > > Subject: RE: Best technique for doing lookup with Secondary
> > > Index
> > > > > > >
> > > > > > > >I build the secondary table "B" using a prePut RegionObserver.
> > > > > > >
> > > > > > > Anil,
> > > > > > >        In prePut hook u call HTable#put()?  Why use the
network
> > > > > calls
> > > > > > > from server side here then? can not handle it from client
> > > alone?
> > > > > You
> > > > > > > can have a look at Lily project.   Thoughts after seeing
ur
> > > idea on
> > > > > put
> > > > > > > and scan..
> > > > > > >
> > > > > > > -Anoop-
> > > > > > > ________________________________________
> > > > > > > From: anil gupta [anilgupta84@gmail.com]
> > > > > > > Sent: Thursday, October 25, 2012 3:10 AM
> > > > > > > To: user@hbase.apache.org
> > > > > > > Subject: Best technique for doing lookup with Secondary
Index
> > > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > I am using HBase 0.92.1. I have created a secondary index
on
> > > table
> > > > > "A".
> > > > > > > Table A stores immutable data. I build the secondary table
"B"
> > > > > using a
> > > > > > > prePut RegionObserver.
> > > > > > >
> > > > > > > The secondary index is stored in table "B" as rowkey B
-->
> > > > > > > family:<rowkey
> > > > > > > A>  . "<rowkey A>" is the column qualifier. Every
row in B will
> > > > > only on
> > > > > > > have one column and the name of that column is the rowkey
of A.
> > > So
> > > > > the
> > > > > > > value is blank. As per my understanding, accessing column
> > > qualifier
> > > > > is
> > > > > > > faster than accessing value. Please correct me if i am
wrong.
> > > > > > >
> > > > > > >
> > > > > > > HBase Querying approach:
> > > > > > > 1. Scan the secondary table by using prefix filter and
> > > startRow.
> > > > > > > 2. Do a batch get on primary table by using
> > > HTable.get(List<Get>)
> > > > > > > method.
> > > > > > >
> > > > > > > The above approach for retrieval works fine but i was wondering
> > > it
> > > > > > > there is
> > > > > > > a better approach. I was planning to try out doing the
> > > retrieval
> > > > > using
> > > > > > > coprocessors.
> > > > > > > Have anyone tried using coprocessors? I would appreciate
if
> > > others
> > > > > can
> > > > > > > share their experience with secondary index for HBase queries.
> > > > > > >
> > > > > > > --
> > > > > > > Thanks & Regards,
> > > > > > > Anil Gupta=
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks & Regards,
> > > > > Anil Gupta
> > > >
> > > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta
> >
> >
>
>
> --
>
> Best Regards!
>
> Fei Ding
> fding.church@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message