mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <pran...@xebia.com>
Subject Re: Difference in results : Clustering : sequential and MapReduce
Date Mon, 03 Oct 2011 03:59:27 GMT
The sequential algorithm finds more/better clusters  than the mapreduce one.
There's not a huge difference, but the standalone one is better for sure.

Thanks and Regards,
Paritosh

On 03-10-2011 01:47, Konstantin Shmakov wrote:
> I'd assume that distributed and sequential algorithms shouldn't produce
> identical results. To start with, they differ in initial setup:
> -- In distributed algorithm each mapper deals with subset of data and starts
> by picking up a random point, so N random points are picked up by N mappers
> to start with.
> -- In sequential algorithm 1 mapper deals with all data and starts by
> picking up 1 random point.
> But for the data with real clusters both algorithms should produce similar
> results.  How different are the results in your case?
>
> Thanks
> --Konstantin
>
>
>
>
>
>
>
>
> On Sun, Oct 2, 2011 at 1:36 AM, Paritosh Ranjan<pranjan@xebia.com>  wrote:
>
>> Even run() of CanopyDriver, which takes only T1 and T2 is giving different
>> results for sequential and mapreduce.
>> This is preventing me from scaling up, as I need to run mapreduce on hadoop
>> to scale.
>>
>> Is anyone having any idea of this problem?
>>
>> On 02-10-2011 00:27, Paritosh Ranjan wrote:
>>
>>> Hi,
>>>
>>> I am able to cluster correctly sequentially, using CanopyDriver.
>>>
>>> However, the same dataset, when processed as a MapReduce job, where ( t1 =
>>> t3 and t2 = t4 and t1>t2) is not working. I am getting errors like Canopies
>>> are empty.
>>>
>>> I also tried to reduce the values of t3 and t4. But reducing it either has
>>> no effect or gives meaningless results.
>>>
>>> Am I doing something wrong? or is there a bug somewhere?
>>>
>>> I feel that both, sequential and MapReduce should give similar results.
>>> But, It is not happening.
>>>
>>> Thanks and Regards,
>>> Paritosh
>>>
>>>
>>> -----
>>> No virus found in this message.
>>> Checked by AVG - www.avg.com
>>> Version: 10.0.1410 / Virus Database: 1520/3932 - Release Date: 10/01/11
>>>
>>
>


Mime
View raw message