hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: loading data in HBase table using APIs
Date Fri, 05 Aug 2011 09:19:29 GMT

It's not obvious to a lot of newer folks that an MR job can exist minus
the R.





On 8/4/11 5:52 PM, "Michael Segel" <michael_segel@hotmail.com> wrote:

>
>Uhm Silly question...
>
>Why would you ever need a reduce step when you're writing to an HBase
>table?
>
>Now I'm sure that there may be some fringe case, but in the past two
>years, I've never come across a case where you would need to do a reducer
>when you're writing to HBase.
>
>So what am I missing?
>
>
>
>> From: doug.meil@explorysmedical.com
>> To: user@hbase.apache.org
>> Date: Thu, 4 Aug 2011 11:18:57 -0400
>> Subject: Re: loading data in HBase table using APIs
>> 
>> 
>> David, thanks for the tip on this.  I just checked in a reorg to the
>> performance chapter and included this tip.
>> 
>> Stack does the website updating so it's not visible yet, but this tip is
>> in there.
>> 
>> Thanks!
>> 
>> 
>> 
>> 
>> On 7/18/11 6:18 PM, "Buttler, David" <buttler1@llnl.gov> wrote:
>> 
>> >After a quick scan of the performance section, I didn't see what I
>> >consider to be a huge performance consideration:
>> >If at all possible, don't do a reduce on your puts.  The shuffle/sort
>> >part of the map/reduce paradigm is often useless if all you are trying
>>to
>> >do is insert/update data in HBase.  From the OP's description it sounds
>> >like he doesn't need to have any kind of reduce phase [and may be a
>>great
>> >candidate for bulk loading and the pre-creation of regions].  In any
>> >case, don't reduce if you can avoid it.
>> >
>> >Dave
>> >
>> >-----Original Message-----
>> >From: Doug Meil [mailto:doug.meil@explorysmedical.com]
>> >Sent: Sunday, July 17, 2011 4:40 PM
>> >To: user@hbase.apache.org
>> >Subject: Re: loading data in HBase table using APIs
>> >
>> >
>> >Hi there-
>> >
>> >Take a look at this for starters:
>> >http://hbase.apache.org/book.html#schema
>> >
>> >1)  double-check your row-keys (sanity check), that's in the Schema
>>Design
>> >chapter.
>> >
>> >http://hbase.apache.org/book.html#performance
>> >
>> >
>> >2)  if not using bulk-load - re-create regions, do this regardless of
>> >using MR or non-MR.
>> >
>> >3)  if not using MR job and are using multiple threads with the Java
>>API,
>> >take a look at HTableUtil.  It's on trunk, but that utility can help
>>you.
>> >
>> >
>> >
>> >
>> >
>> >
>> >On 7/17/11 4:08 PM, "abhay ratnaparkhi" <abhay.ratnaparkhi@gmail.com>
>> >wrote:
>> >
>> >>Hello,
>> >>
>> >>I am loading lots of data through API in HBase table.
>> >>I am using HBase Java API to do this.
>> >>If I convert this code to map-reduce task and use *TableOutputFormat*
>> >>class
>> >>then will I get any performance improvement?
>> >>
>> >>As I am not getting input data from existing HBase table or HDFS files
>> >>there
>> >>will not be any input to map task.
>> >>The only advantage is multiple map tasks running simultaneously might
>> >>make
>> >>processing faster.
>> >>
>> >>Thanks!
>> >>Regars,
>> >>Abhay
>> >
>> 
> 		 	   		  


Mime
View raw message