commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artem Barger <ar...@bargr.net>
Subject Re: [Math] kmeans++: decouple EM LLoyd's iterations and initial seeding of clustering centers.
Date Wed, 01 Jun 2016 14:24:47 GMT
​
On Tue, May 31, 2016 at 4:04 PM, Artem Barger <artem@bargr.net> wrote:

> Hi,
>
> Current implementation of kmeans within CM framework, inherently uses
> algorithm published by  Arthur, David, and Sergei Vassilvitskii.
> "k-means++: The advantages of careful seeding." *Proceedings of the
> eighteenth annual ACM-SIAM symposium on Discrete algorithms*. Society for
> Industrial and Applied Mathematics, 2007. While there other alternative
> algorithms for initial seeding is available, for instance:
>
> 1. Random initialization (each center picked uniformly at random).
> 2. Canopy https://en.wikipedia.org/wiki/Canopy_clustering_algorithm
> 3. Bicriteria  Feldman, Dan, et al. "Bi-criteria linear-time
> approximations for generalized k-mean/median/center." *Proceedings of the
> twenty-third annual symposium on Computational geometry*. ACM, 2007.
>
> While I understand that kmeans++ is preferable option, others could be
> also used for testing, trials and evaluations as well.
>
> I'd like to propose to separate logic of seeding and clustering to
> increase flexibility for kmeans clustering. Would be glad to hear your
> comments, pros/cons or rejections...
>
>
I've found "Scalable KMeans" or kmeans|| as referred in the
http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf, which provides
parallelizable seeding procedure.
​I guess this might serve as additional +1 vote for doing separation
between seeding and LLoyd's iterations in current implementations of kmeans.

​Best,
    Artem Barger.​

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message