Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id ACF91200AF7 for ; Tue, 31 May 2016 01:46:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id AB9C4160A19; Mon, 30 May 2016 23:46:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F1FE9160A3C for ; Tue, 31 May 2016 01:46:13 +0200 (CEST) Received: (qmail 16890 invoked by uid 500); 30 May 2016 23:46:13 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 16876 invoked by uid 99); 30 May 2016 23:46:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 May 2016 23:46:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D10302C1F5D for ; Mon, 30 May 2016 23:46:12 +0000 (UTC) Date: Mon, 30 May 2016 23:46:12 +0000 (UTC) From: "Artem Barger (JIRA)" To: issues@commons.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MATH-1371) Provide accelerated kmeans++ implementation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 30 May 2016 23:46:14 -0000 [ https://issues.apache.org/jira/browse/MATH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307013#comment-15307013 ] Artem Barger commented on MATH-1371: ------------------------------------ One can decide to change the seeding procedure, but I can reduce visibility to private. As for instance variables, used them solely for simplicity, can easily make them local. > Provide accelerated kmeans++ implementation > ------------------------------------------- > > Key: MATH-1371 > URL: https://issues.apache.org/jira/browse/MATH-1371 > Project: Commons Math > Issue Type: Improvement > Reporter: Artem Barger > Assignee: Artem Barger > Attachments: ElkanKmeansPlusPlusClusterer.java > > > There is an updated version of kmeans++ algorithm available, which is published in: Elkan, Charles. "Using the triangle inequality to accelerate k-means." ICML. Vol. 3. 2003. paper. > The main essence is to boost the kmeans iterations by avoiding computation of distances between centers and points when there is no need for that. For example after the update cluster center haven't moved too far from the point therefore no change in point assignment. The accelerated algorithm avoids unnecessary distance calculations by applying the triangle inequality in two different ways, and by keeping track of lower and upper bounds for distances > between points and centers. > Algorithm description is available in the paper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)