mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <pran...@xebia.com>
Subject CanopyDriver : run : clusterFilter : bug
Date Sun, 02 Oct 2011 18:36:06 GMT
The new parameter, clusterFilter, in CanopyDriver's run method, is not 
working properly.

This is because, in ClusterMapper's findClosestCanopy method, the if 
condition

protected Canopy findClosestCanopy(Vector point, Iterable<Canopy>  canopies) {
     ...
     // find closest canopy
     for (Canopy canopy : canopies) {

       double dist = measure.distance(canopy.getCenter().getLengthSquared(), canopy.getCenter(),
point);

       if (*dist<  minDist*) {

         ...
     }   
   }


should be replaced with,

if (*dist < minDist && dist <= t1 *)

Otherwise, all records get the same canopy.

This fix also needs some null pointer checks. I have fixed it, and got 
it working. I will try to provide the patch with a test case which 
reproduces the issue.

Thanks and Regards,
Paritosh Ranjan

On 02-10-2011 14:06, Paritosh Ranjan wrote:
> Even run() of CanopyDriver, which takes only T1 and T2 is giving 
> different results for sequential and mapreduce.
> This is preventing me from scaling up, as I need to run mapreduce on 
> hadoop to scale.
>
> Is anyone having any idea of this problem?
>
> On 02-10-2011 00:27, Paritosh Ranjan wrote:
>> Hi,
>>
>> I am able to cluster correctly sequentially, using CanopyDriver.
>>
>> However, the same dataset, when processed as a MapReduce job, where ( 
>> t1 = t3 and t2 = t4 and t1>t2) is not working. I am getting errors 
>> like Canopies are empty.
>>
>> I also tried to reduce the values of t3 and t4. But reducing it 
>> either has no effect or gives meaningless results.
>>
>> Am I doing something wrong? or is there a bug somewhere?
>>
>> I feel that both, sequential and MapReduce should give similar 
>> results. But, It is not happening.
>>
>> Thanks and Regards,
>> Paritosh
>>
>>
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 10.0.1410 / Virus Database: 1520/3932 - Release Date: 10/01/11
>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1410 / Virus Database: 1520/3932 - Release Date: 10/01/11


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message