hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: Best technique for doing lookup with Secondary Index
Date Fri, 26 Oct 2012 04:44:11 GMT
Hi Anoop,

Yes i use bulk loading for loading table A. I wrote my own mapper as
Importtsv wont suffice my requirements. :) No, i dont call HTable#put()
from my mapper. I was thinking about trying out calling HTable#put() from
my mapper and see the outcome.

 I meant to say that when we use MR job (ex. importtsv) then WAL is not
used. Sorry, if i misunderstood someone.

Thanks,
Anil

On Thu, Oct 25, 2012 at 9:06 PM, Anoop Sam John <anoopsj@huawei.com> wrote:

> Hi Anil,
>               Some confusion after seeing your reply.
> You use bulk loading?  You created your own mapper?  You call HTable#put()
> from mappers?
>
> I think confusion in another thread also..  I was refering to the
> HFileOutputReducer.. There is a TableOutputFormat also... In
> TableOutputFormat it will try put to the HTable...  Here write to WAL is
> applicable...
>
>
> [HFileOutputReducer] : As we discussed in another thread, in case of bulk
> loading the aproach is like MR job create KVs and write to files and this
> file is written as an HFile. Yes this will contain all meta information,
> trailer etc... Finally only HBase cluster need to be contacted just to load
> this HFile(s) into HBase cluster.. Under corresponding regions.  This will
> be the fastest way for bulk loading of huge data...
>
>
> -Anoop-
> ________________________________________
> From: anil gupta [anilgupta84@gmail.com]
> Sent: Friday, October 26, 2012 3:40 AM
> To: user@hbase.apache.org
> Subject: Re: Best technique for doing lookup with Secondary Index
>
> Anoop:  In prePut hook u call HTable#put()?
> Anil: Yes i call HTable#put() in prePut. Is there better way of doing it?
>
> Anoop: Why use the network calls from server side here then?
> Anil: I thought this is a cleaner approach since i am using BulkLoader. I
> decided not to run two jobs since i am generating a UniqueIdentifier at
> runtime in bulkloader.
>
> Anoop: can not handle it from client alone?
> Anil: I cannot handle it from client since i am using BulkLoader. Is it a
> good idea to create Htable instance on "B" and do put in my mapper? I might
> try this idea.
>
> Anoop: You can have a look at Lily project.
> Anil: It's little late for us to evaluate Lily now and at present we dont
> need complex secondary index since our data is immutable.
>
> Ram: what is rowkey B here?
> Anil: Suppose i am storing customer events in table A. I have two
> requirement for data query:
> 1. Query customer events on basis of customer_Id and event_ID.
> 2. Query customer events on basis of event_timestamp and customer_ID.
>
> 70% of querying is done by query#1, so i will create
> <customer_Id><event_ID> as row key of Table A.
> Now, in order to support fast results for query#2, i need to create a
> secondary index on A. I store that secondary index in B, rowkey of B is
> <event_timestamp><customer_ID>  .Every row stores the corresponding rowkey
> of A.
>
> Ram:How is the startRow determined for every query?
> Anil: Its determined by a very simple application logic.
>
> Thanks,
> Anil Gupta
>
> On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan <
> ramkrishna.vasudevan@huawei.com> wrote:
>
> > Just out of curiosity,
> > > The secondary index is stored in table "B" as rowkey B -->
> > > family:<rowkey
> > > A>
> > what is rowkey B here?
> > > 1. Scan the secondary table by using prefix filter and startRow.
> > How is the startRow determined for every query ?
> >
> > Regards
> > Ram
> >
> > > -----Original Message-----
> > > From: Anoop Sam John [mailto:anoopsj@huawei.com]
> > > Sent: Thursday, October 25, 2012 10:15 AM
> > > To: user@hbase.apache.org
> > > Subject: RE: Best technique for doing lookup with Secondary Index
> > >
> > > >I build the secondary table "B" using a prePut RegionObserver.
> > >
> > > Anil,
> > >        In prePut hook u call HTable#put()?  Why use the network calls
> > > from server side here then? can not handle it from client alone? You
> > > can have a look at Lily project.   Thoughts after seeing ur idea on put
> > > and scan..
> > >
> > > -Anoop-
> > > ________________________________________
> > > From: anil gupta [anilgupta84@gmail.com]
> > > Sent: Thursday, October 25, 2012 3:10 AM
> > > To: user@hbase.apache.org
> > > Subject: Best technique for doing lookup with Secondary Index
> > >
> > > Hi All,
> > >
> > > I am using HBase 0.92.1. I have created a secondary index on table "A".
> > > Table A stores immutable data. I build the secondary table "B" using a
> > > prePut RegionObserver.
> > >
> > > The secondary index is stored in table "B" as rowkey B -->
> > > family:<rowkey
> > > A>  . "<rowkey A>" is the column qualifier. Every row in B will only
on
> > > have one column and the name of that column is the rowkey of A. So the
> > > value is blank. As per my understanding, accessing column qualifier is
> > > faster than accessing value. Please correct me if i am wrong.
> > >
> > >
> > > HBase Querying approach:
> > > 1. Scan the secondary table by using prefix filter and startRow.
> > > 2. Do a batch get on primary table by using HTable.get(List<Get>)
> > > method.
> > >
> > > The above approach for retrieval works fine but i was wondering it
> > > there is
> > > a better approach. I was planning to try out doing the retrieval using
> > > coprocessors.
> > > Have anyone tried using coprocessors? I would appreciate if others can
> > > share their experience with secondary index for HBase queries.
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta=
> >
> >
>
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Thanks & Regards,
Anil Gupta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message