hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From akhil1988 <akhilan...@gmail.com>
Subject Re: Why only few map tasks are running at a time inspite of plenty of scope for remaining?
Date Fri, 24 Jul 2009 19:24:48 GMT

Hi,

The 10 regions of the table are lying on just 6 region servers(out of a
total of 13 region servers). And even after setting
mapred.tasktracker.map.tasks.maximum = 4, only total of 2 map tasks run at a
time.  

Also, another observation is that the map tasks are not running on local
data i.e. the input region for the map task lies on some other node.

Name                                                     Region Server                   
           
Encoded Name       Start Key    End Key
WikiPages,,1248388318240                    cn84.cloud.cs.illinois.edu:60020          
1479806923       117236
WikiPages,117236,1248388318240         cn84.cloud.cs.illinois.edu:60020           
1753302296       117236       13813
WikiPages,13813,1248388323072           cn77.cloud.cs.illinois.edu:60020            
200507463        13813        184272
WikiPages,184272,1248388323072         cn77.cloud.cs.illinois.edu:60020           
1543767328       184272       22998
WikiPages,22998,1248388310452           cn71.cloud.cs.illinois.edu:60020           
1972228055       22998         29193
WikiPages,29193,1248388310452           cn71.cloud.cs.illinois.edu:60020           
1630029649       29193         37870
WikiPages,37870,1248388306711           cn73.cloud.cs.illinois.edu:60020           
1028558084       37870             56
WikiPages,56,1248388313083                cn82.cloud.cs.illinois.edu:60020             
332484191       56              73976
WikiPages,73976,1248388316165           cn83.cloud.cs.illinois.edu:60020             
231296585       73976         85491
WikiPages,85491,1248388316165           cn83.cloud.cs.illinois.edu:60020            
1935329066      85491 	


Each region is approximately 90 MB and I have set region max size to be 128
MB. The region size is already less than the maximum size, how should I
split it?

Hadoop does shows 10 map tasks to be run. How would writing custom input
split help, and moreover if I write custom Input Split to divide the rows I
will end up giving rows lying in different regions to a map task(as the rows
are not in sorted order).

Thanks,
--Akhil

Ninad Raut-2 wrote:
> 
> If your data is stored just on one regionserver you will have only one map
>> inspite of setting
>>  conf.set("mapred.tasktracker.map.tasks.maximum", "2");
>> there are two approaches:
>>
>> 1) Do a manual table split
>>
>    2) Write a custom input split which will divide the tables rows amongst
> the maps.
> 
> 
>> On Fri, Jul 24, 2009 at 4:54 AM, akhil1988 <akhilanger@gmail.com> wrote:
>>
>>>
>>> Hi all,
>>>
>>> I am using a HTable as input to my map jobs and my reducer outputs to
>>> another Htable. There are 10 regions of my input HTable. And I have set
>>>        conf.set("mapred.tasktracker.map.tasks.maximum", "2");
>>>        conf.set("mapred.tasktracker.map.tasks.maximum", "2");
>>>       c.setNumReduceTasks(26);
>>> My cluster contains 15 nodes(out of which 2 are maters). When I run my
>>> job,
>>> only 2 map tasks run at a time and the remaining 8 are shown as pending.
>>> 24
>>> reduce tasks(out of 26) also get started initially and remaing 2 are
>>> shown
>>> as pending. I am confused why only 2 tasks are running at a time, though
>>> there are a total of 26 slots for map tasks.
>>>
>>> However, this does not happen when I run jobs in which I take files as
>>> inputs(i.e. only simple MapReduce jobs and not involving HBase at all).
>>> Only
>>> when a Htable is taken as input very few map tasks run concurrently than
>>> expected.
>>>
>>> Can anyone suggest why this is happening?
>>>
>>> What I have observed in simple mapreduce jobs that first all map tasks
>>> are
>>> instantiated and then reduce tasks. But  this does not seem to be
>>> happening
>>> in HTable case??
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Why-only-few-map-tasks-are-running-at-a-time-inspite-of-plenty-of-scope-for-remaining--tp24636315p24636315.html
>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>
>>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Why-only-few-map-tasks-are-running-at-a-time-inspite-of-plenty-of-scope-for-remaining--tp24636315p24650457.html
Sent from the HBase User mailing list archive at Nabble.com.


Mime
View raw message