hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: loading data in HBase table using APIs
Date Mon, 08 Aug 2011 23:21:02 GMT
The doc here suggests avoiding reduce:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#sink
St.Ack

On Fri, Aug 5, 2011 at 2:19 AM, Doug Meil <doug.meil@explorysmedical.com> wrote:
>
> It's not obvious to a lot of newer folks that an MR job can exist minus
> the R.
>
>
>
>
>
> On 8/4/11 5:52 PM, "Michael Segel" <michael_segel@hotmail.com> wrote:
>
>>
>>Uhm Silly question...
>>
>>Why would you ever need a reduce step when you're writing to an HBase
>>table?
>>
>>Now I'm sure that there may be some fringe case, but in the past two
>>years, I've never come across a case where you would need to do a reducer
>>when you're writing to HBase.
>>
>>So what am I missing?
>>
>>
>>
>>> From: doug.meil@explorysmedical.com
>>> To: user@hbase.apache.org
>>> Date: Thu, 4 Aug 2011 11:18:57 -0400
>>> Subject: Re: loading data in HBase table using APIs
>>>
>>>
>>> David, thanks for the tip on this.  I just checked in a reorg to the
>>> performance chapter and included this tip.
>>>
>>> Stack does the website updating so it's not visible yet, but this tip is
>>> in there.
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>> On 7/18/11 6:18 PM, "Buttler, David" <buttler1@llnl.gov> wrote:
>>>
>>> >After a quick scan of the performance section, I didn't see what I
>>> >consider to be a huge performance consideration:
>>> >If at all possible, don't do a reduce on your puts.  The shuffle/sort
>>> >part of the map/reduce paradigm is often useless if all you are trying
>>>to
>>> >do is insert/update data in HBase.  From the OP's description it sounds
>>> >like he doesn't need to have any kind of reduce phase [and may be a
>>>great
>>> >candidate for bulk loading and the pre-creation of regions].  In any
>>> >case, don't reduce if you can avoid it.
>>> >
>>> >Dave
>>> >
>>> >-----Original Message-----
>>> >From: Doug Meil [mailto:doug.meil@explorysmedical.com]
>>> >Sent: Sunday, July 17, 2011 4:40 PM
>>> >To: user@hbase.apache.org
>>> >Subject: Re: loading data in HBase table using APIs
>>> >
>>> >
>>> >Hi there-
>>> >
>>> >Take a look at this for starters:
>>> >http://hbase.apache.org/book.html#schema
>>> >
>>> >1)  double-check your row-keys (sanity check), that's in the Schema
>>>Design
>>> >chapter.
>>> >
>>> >http://hbase.apache.org/book.html#performance
>>> >
>>> >
>>> >2)  if not using bulk-load - re-create regions, do this regardless of
>>> >using MR or non-MR.
>>> >
>>> >3)  if not using MR job and are using multiple threads with the Java
>>>API,
>>> >take a look at HTableUtil.  It's on trunk, but that utility can help
>>>you.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On 7/17/11 4:08 PM, "abhay ratnaparkhi" <abhay.ratnaparkhi@gmail.com>
>>> >wrote:
>>> >
>>> >>Hello,
>>> >>
>>> >>I am loading lots of data through API in HBase table.
>>> >>I am using HBase Java API to do this.
>>> >>If I convert this code to map-reduce task and use *TableOutputFormat*
>>> >>class
>>> >>then will I get any performance improvement?
>>> >>
>>> >>As I am not getting input data from existing HBase table or HDFS files
>>> >>there
>>> >>will not be any input to map task.
>>> >>The only advantage is multiple map tasks running simultaneously might
>>> >>make
>>> >>processing faster.
>>> >>
>>> >>Thanks!
>>> >>Regars,
>>> >>Abhay
>>> >
>>>
>>
>
>

Mime
View raw message