mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Clustering from DB
Date Mon, 27 Jul 2009 16:41:31 GMT

On Jul 27, 2009, at 12:03 PM, Ted Dunning wrote:

> Yes.
> That explains why Jeff didn't see the slow down with dense vectors.

Not following.  The distance calc stuff is irrespective of the type of  
Vector.  I was referring to the centroid length square (I think you  
called it the triangle inequality) stuff that Shashikant added on  
MAHOUT-121.  We use it for testing convergence, but not for other  
distance calculations.  I haven't looked to see if it is applicable  
yet, but it seems like it should be.

> On Mon, Jul 27, 2009 at 8:03 AM, Grant Ingersoll  
> <>wrote:
>> Hmm, some profiling shows the pain is in the distance calculation for
>> emitPointToNearestCluster.  Seems that we only use the optimized  
>> distance
>> calculations for testing convergence, but shouldn't we also use it  
>> for
>> calculating the distances to the cluster, too?
> -- 
> Ted Dunning, CTO
> DeepDyve

Grant Ingersoll

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

View raw message