On Jul 27, 2009, at 12:03 PM, Ted Dunning wrote:
> Yes.
>
> That explains why Jeff didn't see the slow down with dense vectors.
Not following. The distance calc stuff is irrespective of the type of
Vector. I was referring to the centroid length square (I think you
called it the triangle inequality) stuff that Shashikant added on
MAHOUT-121. We use it for testing convergence, but not for other
distance calculations. I haven't looked to see if it is applicable
yet, but it seems like it should be.
>
> On Mon, Jul 27, 2009 at 8:03 AM, Grant Ingersoll
> <gsingers@apache.org>wrote:
>
>> Hmm, some profiling shows the pain is in the distance calculation for
>> emitPointToNearestCluster. Seems that we only use the optimized
>> distance
>> calculations for testing convergence, but shouldn't we also use it
>> for
>> calculating the distances to the cluster, too?
>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search
|