commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Gant <>
Subject Re: [math] Re: commons math
Date Tue, 16 Aug 2005 02:14:59 GMT
IP stuff:
I will send out a link to the pdf that describes KMotif, and the cross
correlation comes from with an
implementation that correlates column-wise. Both euclidean and
city-block distance measures come from basic data mining textbooks (my
textbook is Data Mining by Mehmed Kantardzic) or Please let me know if
this is sufficient, or if I need more references.

Distance measures, are basically a numeric way of classifying a
relationship between two numerical or categorical datasets. Usually
distance measures are used in conjunction with k-means, or
hierarchical clustering (or some type of clustering algorithm).

I think the architecture question applies to K-means and
difference/similarity algorithms. I am not sure of the best
architecture for these algorithms. Should each distance/similarity
measure be its own class, allowing these to be passed into an engine
that is the clustering algorithm? For instance have a k-means class
who has a private variable of type ClusertingMeasurementAlgorithm,

EuclideanDistance which implements,
DistanceMeasure which implements,

Does this sound somewhat logical? If we had an engine that took an
instance of ClusteringMeasurementAlgorithm as a constructor parameter,
it could handle all operations on the data using the specific
measurement algorithm. The reason I am trying to abstract the
clustering algorithm more than a difference measure is due to the fact
that clustering may be done on similiarity and difference measures.
Please tell me if this sounds outrageous, because I do not have alot
of architecture experience.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message