On Mon, Nov 5, 2012 at 4:44 AM, Dan Filimon <dangeorge.filimon@gmail.com>wrote:
>
> Ted told me that Mahout Centroids [1] are Weighted vectors that
> additionally perform a Welfordstyle update of a vector.
>
I think that there may be an older Centroid definition that is different
from this.
> So, in the code, for an existing Centroid c, with weight w_c, updating it
> with a new Vector v whose weight is w_v, the result of an "update" is:
>
> (w_c * c[i] + w_v * v[i]) / (w_c + w_v), for all elements (i is the index)
>
Correct.
> Since weights actually mean the number of elements in a certain cluster,
> merging two clusters is exactly the operation described above.
>
> Why is this called a Welford update?
>
See here
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
> Also, why is the original Vector's function named assign?
This is just one of several assign functions. The name assign comes from
the original name used in the Colt library and is intended to indicate that
it is a destructive operation rather than a copying operation like times().
It's really an implementation of the higher order function zipwith [2].
>
I don't know that zipwith is a more common name. Haskell has historically
had a very small community.
