Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 83397 invoked from network); 7 Apr 2010 01:34:10 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 7 Apr 2010 01:34:10 -0000 Received: (qmail 27059 invoked by uid 500); 7 Apr 2010 01:34:09 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 27026 invoked by uid 500); 7 Apr 2010 01:34:09 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 27018 invoked by uid 99); 7 Apr 2010 01:34:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Apr 2010 01:34:09 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.208.4.195] (HELO mout.perfora.net) (74.208.4.195) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Apr 2010 01:34:00 +0000 Received: from jeff-eastmans-macbook-pro.local (c-71-198-0-148.hsd1.ca.comcast.net [71.198.0.148]) by mrelay.perfora.net (node=mrus0) with ESMTP (Nemesis) id 0M07O8-1NgQmr47mm-00uhKU; Tue, 06 Apr 2010 21:33:39 -0400 Message-ID: <4BBBE0F1.5020509@windwardsolutions.com> Date: Tue, 06 Apr 2010 18:33:37 -0700 From: Jeff Eastman User-Agent: Thunderbird 2.0.0.24 (Macintosh/20100228) MIME-Version: 1.0 To: mahout-user@lucene.apache.org Subject: Re: MAHOUT-236 Cluster Evaluation Tools? References: <4BBB6BF6.7050807@windwardsolutions.com> <4BBB7416.5080902@windwardsolutions.com> <4BBBB15F.3050108@windwardsolutions.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V01U2FsdGVkX1+nyxIdKGY7YihXgUzPj3YN+itsAcUgckilRfx OlFm7nWfMkhht9huHHo4TEYYfCpgHRccIBwbSVg+QTDtIp/HdJ 3mXWAQ2y401ocZ8Q5zl7VoTPCM3xwZZ8qi9b9/NZ7s= X-Virus-Checked: Checked by ClamAV on apache.org Hi Robin, Great! I've got the refactoring changes for consolidating all the various cluster types under a Cluster interface (formerly Printable but now with id, numPoints and a center added). Dirichlet models still don't yet have meaningful ids implemented but they all do (so far anyway) have a notion of "numPoints" and a "center". I'm working on tests tomorrow to make sure the ClusterDumper actually works with Dirichlet clusters then I will commit that. Wednesday or Thursday most likely. BTW, I changed my mind about foisting off the old Printable interface on Vectors (but am still open to the idea if somebody actually working in math thinks it is worth doing). All the new Clusters use the vector formatting done in ClusterBase. What I'd really like is feedback from ClusterDumper users on what is working and what is needed to address MAHOUT-236. That includes you, right? Jeff PS: Ted, you expressed some doubts about the value of consolidating Dirichlet clusters with the others. So far it seems to be a reasonable fit but I'm doing the engineering on a tiny subset of simple models without enough theoretical insight to see any pitfalls ahead. Is there a "DistanceMeasure-like" discussion that might provide a firmer underpinning for this work? Robin Anil wrote: > No one yet. I am willing to help In case you need an extra pair of hands on > this one. > > Robin >