Return-Path: Delivered-To: apmail-lucene-mahout-dev-archive@minotaur.apache.org Received: (qmail 28633 invoked from network); 16 Feb 2010 16:59:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Feb 2010 16:59:21 -0000 Received: (qmail 64434 invoked by uid 500); 16 Feb 2010 16:59:21 -0000 Delivered-To: apmail-lucene-mahout-dev-archive@lucene.apache.org Received: (qmail 64366 invoked by uid 500); 16 Feb 2010 16:59:21 -0000 Mailing-List: contact mahout-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-dev@lucene.apache.org Delivered-To: mailing list mahout-dev@lucene.apache.org Received: (qmail 64356 invoked by uid 99); 16 Feb 2010 16:59:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Feb 2010 16:59:21 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of robin.anil@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-px0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Feb 2010 16:59:14 +0000 Received: by pxi6 with SMTP id 6so5179381pxi.14 for ; Tue, 16 Feb 2010 08:58:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=qCM3kXnhC9o0Eh/+vLlaLKK2tCatr55q/Fu/m/hlgQo=; b=rGQQfIjWKWJ6/p25yct7pA7v3F8SfmNHtTuAbRJBknAld7Xq89poezkUHk9jx8X9Mr 8mKdriuCCSflSwd3Jow4+VLyh/4RE5URpfa2rwrF9FOIwz0ers637BlikswH6qU4qyYh V2xH+wOo1JnhvRVEf/s0DWYYJ8AyuhRx7Mwtk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=nu1rirVT0IqiBnfdiFUL+D2egCnv5UndVh4FauGdCIid0UnyPuS4ErvxtYa1zWslKt ZUr1Ud6Q0Ow5TnsJoe+ikpWdkobS8Mc5VtFv9hKJxTeaoe3NuYQPpvNyV5Jesdh/0GZp QD6nVe/YgrspaNXLZx+OFu7fQvRdoZiQ+9Lg8= MIME-Version: 1.0 Received: by 10.140.55.11 with SMTP id d11mr4485340rva.211.1266339534118; Tue, 16 Feb 2010 08:58:54 -0800 (PST) In-Reply-To: <4B7ACE0A.4010902@windwardsolutions.com> References: <7d7600c51002160538od66b799o7235ecab2869b542@mail.gmail.com> <4B7ACE0A.4010902@windwardsolutions.com> From: Robin Anil Date: Tue, 16 Feb 2010 22:28:34 +0530 Message-ID: <7d7600c51002160858p61e1fd83kbc6b1abc27a65d50@mail.gmail.com> Subject: Re: Fuzzy K Means To: mahout-dev@lucene.apache.org Content-Type: multipart/alternative; boundary=0016369207c618a2a3047fbaa6fd --0016369207c618a2a3047fbaa6fd Content-Type: text/plain; charset=UTF-8 On Tue, Feb 16, 2010 at 10:25 PM, Jeff Eastman wrote: > Looks to me like the unit tests are the only calls to recomputeCenter, > which is where the center is set. The clusterer seems to be calling > computeCentroid, which sets the centroid, instead. I'm not sure why it needs > both instance variables, as the pointProbSum and weightedPointTotal > variables take the place of the single pointTotal in ClusterBase. I think > perhaps center and centroid need to be merged? > > In k-means and canopy, the center is the (read-only) current centroid which > is used for the distance calculations during an iteration, and it is > recomputed by computeCentroid (using pointTotal and numPoints) at the end of > the iteration. > > So just writing computeCentroid should do right? Which is what its doing. @Override public void write(DataOutput out) throws IOException { out.writeInt(clusterId); out.writeBoolean(converged); Vector vector = computeCentroid(); VectorWritable.writeVector(out, vector); } @Override public void readFields(DataInput in) throws IOException { clusterId = in.readInt(); converged = in.readBoolean(); VectorWritable temp = new VectorWritable(); temp.readFields(in); setCenter(temp.get()); this.pointProbSum = 0; this.weightedPointTotal = getCenter().like(); } Jeff > > > > Robin Anil wrote: > >> I have been trying to convert FuzzyKMeans SoftCluster(which should be >> ideally be named FuzzyKmeansCluster) to use the ClusterBase. >> >> I am getting* the same center* for all the clusters. To aid the conversion >> all i did was remove the center vector from the SoftCluster class and >> reuse >> the same from the ClusterBase. These are essentially making no change in >> the >> tests which passes correctly. >> >> So I am questioning whether the implementation keeps the average center at >> all ? Anyone who has used FuzzyKMeans experiencing this? >> >> >> Robin >> >> >> > > --0016369207c618a2a3047fbaa6fd--