hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkrishna.S.Vasudevan" <ramkrishna.vasude...@huawei.com>
Subject RE: Best technique for doing lookup with Secondary Index
Date Fri, 26 Oct 2012 14:33:55 GMT
Yes we can do this, but for it to happen you may have to have your custom
load balancer which will help you in getting the collocation.

Regards
Ram

> -----Original Message-----
> From: Jerry Lam [mailto:chilinglam@gmail.com]
> Sent: Friday, October 26, 2012 7:59 PM
> To: user@hbase.apache.org
> Subject: Re: Best technique for doing lookup with Secondary Index
> 
> Can we enforce 2 regions to collocate together as a logical group?
> 
> On Fri, Oct 26, 2012 at 6:14 AM, fding hbase <fding.hbase@gmail.com>
> wrote:
> 
> > https://github.com/danix800/hbase-indexed
> >
> > On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan <
> > ramkrishna.vasudevan@huawei.com> wrote:
> >
> > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on
> same
> > > > RS
> > > > since these two regions are from different table. Am i right?
> > >
> > > No... suppose your Region A and Region B of different tables are
> > collocated
> > > on same RS then from the coprocessor environment variable you can
> get
> > > access
> > > to the RS.
> > > From RS you can get the online regions and from that region object
> you
> > can
> > > call puts or gets.  This will not involve any RPC with in that RS
> because
> > > we
> > > only deal with Region objects.
> > >
> > > Regards
> > > Ram
> > >
> > > > -----Original Message-----
> > > > From: anil gupta [mailto:anilgupta84@gmail.com]
> > > > Sent: Friday, October 26, 2012 12:17 PM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: Best technique for doing lookup with Secondary Index
> > > >
> > > > >
> > > > > Now your main question is lookups right
> > > > > Now there are some more hooks in the scan flow called
> > > > pre/postScannerOpen,
> > > > > pre/postScannerNext.
> > > > > May be you can try using them to do a look up on the secondary
> table
> > > > and
> > > > > then use those values and pass it to the main table next().
> > > > >
> > > >
> > > > In secondary index its hard to avoid at-least two RPC calls(1
> from
> > > > client
> > > > to table B and then from table B to Table A) whether you use
> coproc or
> > > > not.
> > > > But, i believe using coproc is better than doing RPC calls from
> client
> > > > since it might be outside the subnet/network of cluster. In this
> case,
> > > > the
> > > > RPC will be faster when we use coprocs. In my case the client is
> > > > certainly
> > > > not in the same subnet or network zone. I need to provide results
> of
> > > > query
> > > > in around 100 milliseconds or less so i need to be really frugal.
> Let
> > > > me
> > > > know your views on this.
> > > >
> > > > Have you implemented queries with Secondary indexes using coproc
> yet?
> > > > At present i have tried the client side query and i can get the
> results
> > > > of
> > > > query in around 100 ms. I am enticed to try out the coproc
> > > > implementation.
> > > >
> > > > But this may involve more RPC calls as your regions of "A" and
> "B" may
> > > > be in
> > > > > different RS.
> > > > >
> > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on
> same
> > > > RS
> > > > since these two regions are from different table. Am i right?
> > > >
> > > >
> > > > Thanks,
> > > > Anil Gupta
> > > >
> > > > On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan <
> > > > ramkrishna.vasudevan@huawei.com> wrote:
> > > >
> > > > > > Is it a
> > > > > > good idea to create Htable instance on "B" and do put in my
> mapper?
> > > > I
> > > > > > might
> > > > > > try this idea.
> > > > > Yes you can do this..  May be the same mapper you can do a put
> for
> > > > table
> > > > > "B".  This was how we have tried loading data to another table
> by
> > > > using the
> > > > > main table "A"
> > > > > Puts.
> > > > >
> > > > > Now your main question is lookups right
> > > > > Now there are some more hooks in the scan flow called
> > > > pre/postScannerOpen,
> > > > > pre/postScannerNext.
> > > > > May be you can try using them to do a look up on the secondary
> table
> > > > and
> > > > > then use those values and pass it to the main table next().
> > > > > But this may involve more RPC calls as your regions of "A" and
> "B"
> > > > may be
> > > > > in
> > > > > different RS.
> > > > >
> > > > > If something is wrong in my understanding of what you said,
> kindly
> > > > spare
> > > > > me.
> > > > > :)
> > > > >
> > > > > Regards
> > > > > Ram
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: anil gupta [mailto:anilgupta84@gmail.com]
> > > > > > Sent: Friday, October 26, 2012 3:40 AM
> > > > > > To: user@hbase.apache.org
> > > > > > Subject: Re: Best technique for doing lookup with Secondary
> Index
> > > > > >
> > > > > > Anoop:  In prePut hook u call HTable#put()?
> > > > > > Anil: Yes i call HTable#put() in prePut. Is there better way
> of
> > > > doing
> > > > > > it?
> > > > > >
> > > > > > Anoop: Why use the network calls from server side here then?
> > > > > > Anil: I thought this is a cleaner approach since i am using
> > > > BulkLoader.
> > > > > > I
> > > > > > decided not to run two jobs since i am generating a
> > > > UniqueIdentifier at
> > > > > > runtime in bulkloader.
> > > > > >
> > > > > > Anoop: can not handle it from client alone?
> > > > > > Anil: I cannot handle it from client since i am using
> BulkLoader.
> > > > Is it
> > > > > > a
> > > > > > good idea to create Htable instance on "B" and do put in my
> mapper?
> > > > I
> > > > > > might
> > > > > > try this idea.
> > > > > >
> > > > > > Anoop: You can have a look at Lily project.
> > > > > > Anil: It's little late for us to evaluate Lily now and at
> present
> > > > we
> > > > > > dont
> > > > > > need complex secondary index since our data is immutable.
> > > > > >
> > > > > > Ram: what is rowkey B here?
> > > > > > Anil: Suppose i am storing customer events in table A. I have
> two
> > > > > > requirement for data query:
> > > > > > 1. Query customer events on basis of customer_Id and
> event_ID.
> > > > > > 2. Query customer events on basis of event_timestamp and
> > > > customer_ID.
> > > > > >
> > > > > > 70% of querying is done by query#1, so i will create
> > > > > > <customer_Id><event_ID> as row key of Table A.
> > > > > > Now, in order to support fast results for query#2, i need to
> create
> > > > a
> > > > > > secondary index on A. I store that secondary index in B,
> rowkey of
> > > > B is
> > > > > > <event_timestamp><customer_ID>  .Every row stores
the
> corresponding
> > > > > > rowkey
> > > > > > of A.
> > > > > >
> > > > > > Ram:How is the startRow determined for every query?
> > > > > > Anil: Its determined by a very simple application logic.
> > > > > >
> > > > > > Thanks,
> > > > > > Anil Gupta
> > > > > >
> > > > > > On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan <
> > > > > > ramkrishna.vasudevan@huawei.com> wrote:
> > > > > >
> > > > > > > Just out of curiosity,
> > > > > > > > The secondary index is stored in table "B" as rowkey
B --
> >
> > > > > > > > family:<rowkey
> > > > > > > > A>
> > > > > > > what is rowkey B here?
> > > > > > > > 1. Scan the secondary table by using prefix filter
and
> > > > startRow.
> > > > > > > How is the startRow determined for every query ?
> > > > > > >
> > > > > > > Regards
> > > > > > > Ram
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Anoop Sam John [mailto:anoopsj@huawei.com]
> > > > > > > > Sent: Thursday, October 25, 2012 10:15 AM
> > > > > > > > To: user@hbase.apache.org
> > > > > > > > Subject: RE: Best technique for doing lookup with
> Secondary
> > > > Index
> > > > > > > >
> > > > > > > > >I build the secondary table "B" using a prePut
> RegionObserver.
> > > > > > > >
> > > > > > > > Anil,
> > > > > > > >        In prePut hook u call HTable#put()?  Why use
the
> network
> > > > > > calls
> > > > > > > > from server side here then? can not handle it from
client
> > > > alone?
> > > > > > You
> > > > > > > > can have a look at Lily project.   Thoughts after
seeing
> ur
> > > > idea on
> > > > > > put
> > > > > > > > and scan..
> > > > > > > >
> > > > > > > > -Anoop-
> > > > > > > > ________________________________________
> > > > > > > > From: anil gupta [anilgupta84@gmail.com]
> > > > > > > > Sent: Thursday, October 25, 2012 3:10 AM
> > > > > > > > To: user@hbase.apache.org
> > > > > > > > Subject: Best technique for doing lookup with Secondary
> Index
> > > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > I am using HBase 0.92.1. I have created a secondary
index
> on
> > > > table
> > > > > > "A".
> > > > > > > > Table A stores immutable data. I build the secondary
> table "B"
> > > > > > using a
> > > > > > > > prePut RegionObserver.
> > > > > > > >
> > > > > > > > The secondary index is stored in table "B" as rowkey
B --
> >
> > > > > > > > family:<rowkey
> > > > > > > > A>  . "<rowkey A>" is the column qualifier.
Every row in
> B will
> > > > > > only on
> > > > > > > > have one column and the name of that column is the
rowkey
> of A.
> > > > So
> > > > > > the
> > > > > > > > value is blank. As per my understanding, accessing
column
> > > > qualifier
> > > > > > is
> > > > > > > > faster than accessing value. Please correct me if
i am
> wrong.
> > > > > > > >
> > > > > > > >
> > > > > > > > HBase Querying approach:
> > > > > > > > 1. Scan the secondary table by using prefix filter
and
> > > > startRow.
> > > > > > > > 2. Do a batch get on primary table by using
> > > > HTable.get(List<Get>)
> > > > > > > > method.
> > > > > > > >
> > > > > > > > The above approach for retrieval works fine but i
was
> wondering
> > > > it
> > > > > > > > there is
> > > > > > > > a better approach. I was planning to try out doing
the
> > > > retrieval
> > > > > > using
> > > > > > > > coprocessors.
> > > > > > > > Have anyone tried using coprocessors? I would appreciate
> if
> > > > others
> > > > > > can
> > > > > > > > share their experience with secondary index for HBase
> queries.
> > > > > > > >
> > > > > > > > --
> > > > > > > > Thanks & Regards,
> > > > > > > > Anil Gupta=
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks & Regards,
> > > > > > Anil Gupta
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Anil Gupta
> > >
> > >
> >
> >
> > --
> >
> > Best Regards!
> >
> > Fei Ding
> > fding.church@gmail.com
> >


Mime
View raw message