mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <pran...@xebia.com>
Subject Re: Clustering : Number of Reducers
Date Mon, 19 Sep 2011 05:02:50 GMT
So, does this mean that Mahout can not support clustering for large data?

Even in DirichletDriver the number of reducers is hardcoded to 1. And we 
need canopies to run KMeansDriver.

Paritosh

On 19-09-2011 01:47, Konstantin Shmakov wrote:
> For most of the tasks one can force the number of reducers with
> mapred.reduce.tasks=<N>
> where<N>  the desired number of reducers.
>
> It will not necessary increase the performance though - with kmeans and
> fuzzykmeans combiners do reducers job and increasing the number of reducers
> won't usually affect performance.
>
> With the canopy the distributed
> algorithm<http://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java?revision=1134456&view=markup>has
> no combiners and has 1 reducer hardcoded
> - trying to increase #reducers won't have any effect as the algorithm
> doesn't work with>1 reducer. My experience that the canopy won't scale to
> large data and need improvement.
>
> -- Konstantin
>
>
>
> On Sun, Sep 18, 2011 at 10:50 AM, Paritosh Ranjan<pranjan@xebia.com>  wrote:
>
>> Hi,
>>
>> I have been trying to cluster some hundreds of millions of records using
>> Mahout Clustering techniques.
>>
>> The number of reducers is always one which I am not able to change. This is
>> effecting the performance. I am using Mahout 0.5
>>
>> In 0.6-SNAPSHOT, I see that the MeanShiftCanopyDriver has been changed to
>> use any number of reducers. Will other ClusterDrivers also get changed to
>> use any number of reducers in 0.6?
>>
>> Thanks and Regards,
>> Paritosh Ranjan
>>
>>
>>
>


Mime
View raw message