Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 56281 invoked from network); 20 Jan 2011 15:25:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Jan 2011 15:25:01 -0000 Received: (qmail 49666 invoked by uid 500); 20 Jan 2011 15:25:01 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 49338 invoked by uid 500); 20 Jan 2011 15:24:58 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 49330 invoked by uid 99); 20 Jan 2011 15:24:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jan 2011 15:24:57 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vj8211@hotmail.com designates 65.54.190.36 as permitted sender) Received: from [65.54.190.36] (HELO bay0-omc1-s25.bay0.hotmail.com) (65.54.190.36) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jan 2011 15:24:49 +0000 Received: from BAY140-W2 ([65.54.190.61]) by bay0-omc1-s25.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 20 Jan 2011 07:24:29 -0800 Message-ID: Content-Type: multipart/alternative; boundary="_de887252-60e5-4087-a051-ac854fdae525_" X-Originating-IP: [67.106.132.226] From: Veronica Joh To: Subject: Incremental clustering - Kmeans + Canopy Date: Thu, 20 Jan 2011 15:24:28 +0000 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 20 Jan 2011 15:24:29.0397 (UTC) FILETIME=[21644850:01CBB8B6] --_de887252-60e5-4087-a051-ac854fdae525_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi=20 I have large number of artcles clustered by kmeans.=20 For the new articles that comes in=2C it says I need to "use canopy cluster= ing to assign it to the cluster whose centroid is closest based on a very s= mall distance threshold" according to Mahout in Action book.=20 I'm not sure how to add new article canopies to the existing cluster.=20 =20 So I'm saving batch articles in a list of Cluster like this.=20 List clusters =3D new ArrayList()=3B=20 =20 For the new article canopies=2C I'm trying following to measure the distanc= e=2C but I get error like this. "Required cardinality 11981 but got 77372"= =20 Text key =3D new Text()=3B=20 Canopy value =3D new Canopy()=3B=20 DistanceMeasure measure =3D new EuclideanDistanceMeasure()=3B=20 while (reader.next(key=2C value)){=20 for (int i=3D0=3B i