Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Jakarta Commons Developers List" <commons-dev@jakarta.apache.org>
Received-SPF: pass (asf.osuosl.org: domain of john.gant@gmail.com designates
 64.233.184.205 as permitted sender)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
        b=Ouw1PpH6orEHfWSD/ACmFeqvLyUnAEIpM/vtiG3dTpZFk2tu0Y51EkF97pAfpr1BlkJlC015nKs+TRgv0V/cPpdu5ynp8huoFdqmSw/4Uuqf03+INLAbJl6iFx24FknahxuTpJz9twW/mpbsSpTXUe8VkdsHV10TJe87Ix8zg9Q=
Message-ID: <f9f8755405081519145a022296@mail.gmail.com>
Date: Mon, 15 Aug 2005 22:14:59 -0400
From: John Gant <john.gant@gmail.com>
To: Jakarta Commons Developers List <commons-dev@jakarta.apache.org>
Subject: Re: [math] Re: commons math
In-Reply-To: <8a81b4af05081518033d60365d@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
References: <f9f87554050813090173efafe2@mail.gmail.com>
	 <8a81b4af05081315277bdb5ac5@mail.gmail.com>
	 <f9f8755405081316291b1d0469@mail.gmail.com>
	 <8a81b4af05081518033d60365d@mail.gmail.com>

IP stuff:
I will send out a link to the pdf that describes KMotif, and the cross
correlation comes from
http://mathworld.wolfram.com/CorrelationCoefficient.html with an
implementation that correlates column-wise. Both euclidean and
city-block distance measures come from basic data mining textbooks (my
textbook is Data Mining by Mehmed Kantardzic) or
http://www.statsoft.com/textbook/stcluan.html. Please let me know if
this is sufficient, or if I need more references.

Distance measures, are basically a numeric way of classifying a
relationship between two numerical or categorical datasets. Usually
distance measures are used in conjunction with k-means, or
hierarchical clustering (or some type of clustering algorithm).

I think the architecture question applies to K-means and
difference/similarity algorithms. I am not sure of the best
architecture for these algorithms. Should each distance/similarity
measure be its own class, allowing these to be passed into an engine
that is the clustering algorithm? For instance have a k-means class
who has a private variable of type ClusertingMeasurementAlgorithm,
where:

EuclideanDistance which implements,
DistanceMeasure which implements,
ClusteringMeasurementAlgorithm

Does this sound somewhat logical? If we had an engine that took an
instance of ClusteringMeasurementAlgorithm as a constructor parameter,
it could handle all operations on the data using the specific
measurement algorithm. The reason I am trying to abstract the
clustering algorithm more than a difference measure is due to the fact
that clustering may be done on similiarity and difference measures.
Please tell me if this sounds outrageous, because I do not have alot
of architecture experience.

Thanks,
John

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org