2011/11/4 Grant Ingersoll <gsingers@apache.org>
> > Hi All I'm reading the code of SquaredEuclideanDistanceMeasure, the
> "distance(double centroidLengthSquare, Vector centroid, Vector v)" method
> confused me a lot, i don't know why we choose this expression
> "centroidLengthSquare  2 * v.dot(centroid) + v.getLengthSquared()" to
> calculate the distance?
>
> Math is as Sebastian said, the purpose is that it can save some steps in
> the computation, for instance, see the KMeans implementation where you are
> often calculating the distance between some point and a centroid. We have
> the centroidLengthSquare handy, so it saves a fair amount of work.
Grant is right, but he is softpedaling the savings.
c^2 is known ahead. v is very, very sparse. Thus c v is very sparse and
has the same sparsity pattern as v^2 (which can even be cached). On the
other hand cv is much less sparse.
The difference looks minor when you first write it down, but it can easily
be a factor of 1000 or more.
