spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhengruifeng (JIRA)" <>
Subject [jira] [Updated] (SPARK-14174) Implement the Mini-Batch KMeans
Date Thu, 22 Jun 2017 08:21:00 GMT


zhengruifeng updated SPARK-14174:
    Attachment: MBKM.xlsx

> Implement the Mini-Batch KMeans
> -------------------------------
>                 Key: SPARK-14174
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: zhengruifeng
>         Attachments: MBKM.xlsx
> The MiniBatchKMeans is a variant of the KMeans algorithm which uses mini-batches to reduce
the computation time, while still attempting to optimise the same objective function. Mini-batches
are subsets of the input data, randomly sampled in each training iteration. These mini-batches
drastically reduce the amount of computation required to converge to a local solution. In
contrast to other algorithms that reduce the convergence time of k-means, mini-batch k-means
produces results that are generally only slightly worse than the standard algorithm.
> Comparison of the K-Means and MiniBatchKMeans on sklearn :
> Since MiniBatch-KMeans with fraction=1.0 is not equal to KMeans, so I make it a new estimator

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message