commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artem Barger <>
Subject Re: [Math] kmeans++: decouple EM LLoyd's iterations and initial seeding of clustering centers.
Date Wed, 01 Jun 2016 14:24:47 GMT
On Tue, May 31, 2016 at 4:04 PM, Artem Barger <> wrote:

> Hi,
> Current implementation of kmeans within CM framework, inherently uses
> algorithm published by  Arthur, David, and Sergei Vassilvitskii.
> "k-means++: The advantages of careful seeding." *Proceedings of the
> eighteenth annual ACM-SIAM symposium on Discrete algorithms*. Society for
> Industrial and Applied Mathematics, 2007. While there other alternative
> algorithms for initial seeding is available, for instance:
> 1. Random initialization (each center picked uniformly at random).
> 2. Canopy
> 3. Bicriteria  Feldman, Dan, et al. "Bi-criteria linear-time
> approximations for generalized k-mean/median/center." *Proceedings of the
> twenty-third annual symposium on Computational geometry*. ACM, 2007.
> While I understand that kmeans++ is preferable option, others could be
> also used for testing, trials and evaluations as well.
> I'd like to propose to separate logic of seeding and clustering to
> increase flexibility for kmeans clustering. Would be glad to hear your
> comments, pros/cons or rejections...
I've found "Scalable KMeans" or kmeans|| as referred in the, which provides
parallelizable seeding procedure.
​I guess this might serve as additional +1 vote for doing separation
between seeding and LLoyd's iterations in current implementations of kmeans.

    Artem Barger.​

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message