Return-Path: Delivered-To: apmail-jakarta-commons-dev-archive@www.apache.org Received: (qmail 48966 invoked from network); 16 Aug 2005 02:15:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 16 Aug 2005 02:15:11 -0000 Received: (qmail 18054 invoked by uid 500); 16 Aug 2005 02:15:09 -0000 Delivered-To: apmail-jakarta-commons-dev-archive@jakarta.apache.org Received: (qmail 18013 invoked by uid 500); 16 Aug 2005 02:15:08 -0000 Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Help: List-Post: List-Id: "Jakarta Commons Developers List" Reply-To: "Jakarta Commons Developers List" Delivered-To: mailing list commons-dev@jakarta.apache.org Received: (qmail 17999 invoked by uid 99); 16 Aug 2005 02:15:08 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Aug 2005 19:15:08 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of john.gant@gmail.com designates 64.233.184.205 as permitted sender) Received: from [64.233.184.205] (HELO wproxy.gmail.com) (64.233.184.205) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Aug 2005 19:15:25 -0700 Received: by wproxy.gmail.com with SMTP id 69so1250349wra for ; Mon, 15 Aug 2005 19:14:59 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=Ouw1PpH6orEHfWSD/ACmFeqvLyUnAEIpM/vtiG3dTpZFk2tu0Y51EkF97pAfpr1BlkJlC015nKs+TRgv0V/cPpdu5ynp8huoFdqmSw/4Uuqf03+INLAbJl6iFx24FknahxuTpJz9twW/mpbsSpTXUe8VkdsHV10TJe87Ix8zg9Q= Received: by 10.54.35.39 with SMTP id i39mr3936510wri; Mon, 15 Aug 2005 19:14:59 -0700 (PDT) Received: by 10.54.99.7 with HTTP; Mon, 15 Aug 2005 19:14:59 -0700 (PDT) Message-ID: Date: Mon, 15 Aug 2005 22:14:59 -0400 From: John Gant To: Jakarta Commons Developers List Subject: Re: [math] Re: commons math In-Reply-To: <8a81b4af05081518033d60365d@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <8a81b4af05081315277bdb5ac5@mail.gmail.com> <8a81b4af05081518033d60365d@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N IP stuff: I will send out a link to the pdf that describes KMotif, and the cross correlation comes from http://mathworld.wolfram.com/CorrelationCoefficient.html with an implementation that correlates column-wise. Both euclidean and city-block distance measures come from basic data mining textbooks (my textbook is Data Mining by Mehmed Kantardzic) or http://www.statsoft.com/textbook/stcluan.html. Please let me know if this is sufficient, or if I need more references. Distance measures, are basically a numeric way of classifying a relationship between two numerical or categorical datasets. Usually distance measures are used in conjunction with k-means, or hierarchical clustering (or some type of clustering algorithm). I think the architecture question applies to K-means and difference/similarity algorithms. I am not sure of the best architecture for these algorithms. Should each distance/similarity measure be its own class, allowing these to be passed into an engine that is the clustering algorithm? For instance have a k-means class who has a private variable of type ClusertingMeasurementAlgorithm, where: EuclideanDistance which implements, DistanceMeasure which implements, ClusteringMeasurementAlgorithm Does this sound somewhat logical? If we had an engine that took an instance of ClusteringMeasurementAlgorithm as a constructor parameter, it could handle all operations on the data using the specific measurement algorithm. The reason I am trying to abstract the clustering algorithm more than a difference measure is due to the fact that clustering may be done on similiarity and difference measures. Please tell me if this sounds outrageous, because I do not have alot of architecture experience. Thanks, John --------------------------------------------------------------------- To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: commons-dev-help@jakarta.apache.org