mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <>
Subject Re: Not all Mapper/Reducer slots are taken when running K-Means cluster
Date Sun, 11 Mar 2012 07:48:52 GMT
Can you try reducing/increasing you block and see the impact?
I am suspecting block size to be the problem.

I have faced the same problem once ( for a different hadoop job, and it
was very hard to debug it ). In that case, CompositeInputFormat was
being used as input, which used to fix the block size to 64 MB, and
hence, only few reducers were activated. So, trying different block
sizes might give some clue.

On 11-03-2012 11:04, WangRamon wrote:
> Here is the configuration:   <property>
>         <name></name>
>         <value>14</value>
>     </property>
>     <property>
>         <name>mapred.tasktracker.reduce.tasks.maximum</name>
>         <value>14</value>
>     </property>
>     <property>
>         <name>mapred.reduce.tasks</name>
>         <value>73</value>
>     </property>
>   Each node has a RAM of 32GB, i think it should be fine to have the above configuartion.
>  > Date: Sat, 10 Mar 2012 22:31:44 -0700
>> From:
>> To:
>> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means cluster
>> What's your Hadoop config in terms of the maximum number of reducers?
>> It's a function of your available RAM on each node and numbers of nodes.
>> On 3/10/12 8:55 PM, WangRamon wrote:
>>> Hi Paritosh    I did the tests with 1 job and 5 jobs, they all have the same
problem, the job i'm running is the buildClusters one, I can see there are 73 reduce tasks
created from the monitor GUI, but only 12 of them are running at any time (the rest are in
pending state), the task finished very quickly, it's about no more than 18 seconds to finish
every reduce task, so maybe that's the cause? Thanks    Cheers  Ramon
>>>  > Date: Sun, 11 Mar 2012 09:14:15 +0530
>>>> From:
>>>> To:
>>>> Subject: Re: Not all Mapper/Reducer slots are taken when running K-Means
>>>> And to answer the question about KMeans configuration :
>>>> Kmeans has two jobs :
>>>> 1) builClusters : has a reducer and has no limitation on the number of
>>>> reducer tasks
>>>> 2) clusterData : executes if runClustering = true, has no reducer tasks
>>>> On 11-03-2012 09:10, Paritosh Ranjan wrote:
>>>>> Can you run K-means jobs again ( all with the same block size ) and give
>>>>> same statistics for :
>>>>> a) only 1 job running
>>>>> b) 2 jobs running simultaneously
>>>>> c) 5 jobs running simultaneously
>>>>> On 10-03-2012 21:08, WangRamon wrote:
>>>>>> Hi All  I submit 5  K-Means Jobs simultaneously, my Hadoop cluster
have 42 map and 42 reduce slots configured, I set the default reduce task per job as 73 (42
* 1.75), I find there are always about 12 of the reduce tasks are running at any time although
there are 73 reduce tasks created for each of the K-Means job and i do have 42 reduce slots,
it means at anytime i have about 30 reduce slots free. So i tried RecommenderJob from mahout
again, i remember that job will use all my slots in my previouse test, and YES for this time,
"RowSimilarityJob-CooccurrencesMapper-Reducer" do use all the slots 42 reduce and 42 map,
so I'm wondering is that something configured in Mahout which cause this strange behavior?
Any suggestions? Thanks in advance.   Btw, i'm using mahout-0.6 release. Cheers Ramon 		 

View raw message