Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 31842 invoked from network); 26 Jun 2009 02:11:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Jun 2009 02:11:42 -0000 Received: (qmail 53166 invoked by uid 500); 26 Jun 2009 02:11:53 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 53116 invoked by uid 500); 26 Jun 2009 02:11:53 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 53106 invoked by uid 99); 26 Jun 2009 02:11:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Jun 2009 02:11:53 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [208.97.132.207] (HELO spunkymail-a1.g.dreamhost.com) (208.97.132.207) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Jun 2009 02:11:42 +0000 Received: from [192.168.0.105] (adsl-074-229-189-244.sip.rmo.bellsouth.net [74.229.189.244]) by spunkymail-a1.g.dreamhost.com (Postfix) with ESMTP id 8FF7BFE1AF for ; Thu, 25 Jun 2009 19:11:20 -0700 (PDT) Message-Id: <2A36DF0C-ED03-4067-9EFA-1312C72CCD55@apache.org> From: Grant Ingersoll To: mahout-user@lucene.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Subject: Re: k-Means questions Date: Thu, 25 Jun 2009 22:11:19 -0400 References: <4E4BE67D-7CD2-4449-B3D4-CAA4E2CC1707@apache.org> X-Mailer: Apple Mail (2.935.3) X-Virus-Checked: Checked by ClamAV on apache.org On Jun 25, 2009, at 7:00 PM, Ted Dunning wrote: > On Thu, Jun 25, 2009 at 3:49 PM, Grant Ingersoll > wrote: > >> Do people have recommendations for start clusters (seeds) for k- >> Means. The >> synthetic control example uses Canopy and I often see Random >> selection >> mentioned, but I'm wondering what's considered to be best practices >> for >> obtaining good overall results. >> > > Just picking a random data element for each centroid should work well. > Random assignment works much less well because all of the centroids > get put > very close to the mean of the entire data set. I'm confused by these two sentences. They seem contradictory, but I'm sure the error is on my end. -Grant