mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Can anybody explain the distance method in SquaredEuclideanDistanceMeasure?
Date Fri, 04 Nov 2011 16:35:38 GMT
2011/11/4 Grant Ingersoll <gsingers@apache.org>

> > Hi All  I'm reading the code of SquaredEuclideanDistanceMeasure, the
> "distance(double centroidLengthSquare, Vector centroid, Vector v)" method
> confused me a lot, i don't know why we choose this expression
> "centroidLengthSquare - 2 * v.dot(centroid) + v.getLengthSquared()" to
> calculate the distance?
>
> Math is as Sebastian said, the purpose is that it can save some steps in
> the computation, for instance, see the KMeans implementation where you are
> often calculating the distance between some point and a centroid.  We have
> the centroidLengthSquare handy, so it saves a fair amount of work.


Grant is right, but he is soft-pedaling the savings.

c^2 is known ahead.  v is very, very sparse.  Thus c v is very sparse and
has the same sparsity pattern as v^2 (which can even be cached).   On the
other hand c-v is much less sparse.

The difference looks minor when you first write it down, but it can easily
be a factor of 1000 or more.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message