hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jl...@streamy.com>
Subject Re: Doubt in HBase
Date Thu, 20 Aug 2009 20:24:33 GMT
What Amandeep said.

Also, one clarification for you.  You mentioned the reduce task moving 
map output across regionservers.  Remember, HBase is just a MapReduce 
input source or output sink.  The sort/shuffle/reduce is a part of 
Hadoop MapReduce and has nothing to do with HBase directly.  It is 
utilizing the JobTracker/TaskTrackers, not the RegionServers.

Like AK said, you can increase the number of reducers, or reduce the 
amount of data you output from the maps.


Amandeep Khurana wrote:
> On Thu, Aug 20, 2009 at 9:42 AM, john smith <js1987.smith@gmail.com> wrote:
>> Hi all ,
>> I have one small doubt . Kindly answer it even if it sounds silly.
> No questions are silly.. Dont worry
>> Iam using Map Reduce in HBase in distributed mode .  I have a table which
>> spans across 5 region servers . I am using TableInputFormat to read the
>> data
>> from the tables in the map . When i run the program , by default how many
>> map regions are created ? Is it one per region server or more ?
> If you set the number of map tasks to a high number, it automatically spawns
> one map task for each region (not region server). Otherwise, it'll spawn the
> number you have explicitly specified in the job.
>> Also after the map task is over.. reduce task is taking a bit more time .
>> Is
>> it due to moving the map output across the regionservers? i.e, moving the
>> values of same key to a particular reduce phase to start the reducer? Is
>> there any way i can optimize the code (e.g. by storing data of same reducer
>> nearby )
> Increase the number of reducers. Each reducer will have lesser data to move.
>> Thanks :)

View raw message