commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kim van der Linde" <>
Subject Re: [math] Re: commons math
Date Tue, 23 Aug 2005 00:10:21 GMT
In the past, I have made matrix based distance and special Matrices
classes that can be adapted to do just these things. It contains SSCP,
covariance and correlation matrices and rowbased Euclidian and Mahalanobis
distances. An idea?



John Gant said:
>  What exactly does "column-wise" mean.  This just looks like Pearson's
>> R, which is already available in the SimpleRegression class.  Do you
>> mean generation of correlation matrices?
> Sorry, I should have been more specific. This will allow someone to
> calculate the pearson r coefficient between column vectors. This
> results in a correlation matrix with dimensions (c * c), where c is
> the number of columns in the raw data matrix.
>> > Distance measures, are basically a numeric way of classifying a
>> > relationship between two numerical or categorical datasets. Usually
>> > distance measures are used in conjunction with k-means, or
>> > hierarchical clustering (or some type of clustering algorithm).
>> Are these essentially metrics on R^n (the "numerical" case) or
>> homogeneity measures (e.g. chi-square, for the categorical case)?
> The numerical distance measures can either be something as simple as
> euclidean distance, or correlation cofficient. The categorical
> measures are more logical (less numerical), and something like hamming
> distance could be used. Does this answer your question?
>> If a clustering algorithm can use mutlitple different distance
>> measures, then it does make sense to encapsulate the distance measure.
>>  Defining a distance measure or metric interface and then defining
>> implementation classes that implement that interface and having the
>> clustering algorithms have instances of these as members is a
>> reasonable way to do this, IMHO.
> A clustering algorithm is usually independent of the distance measure,
> but relies on this measure to identify clusters. All clustering
> algorithms (that I have experience with) use distance measures, and I
> plan on setting up the implementation so that it is similar to the
> contract of Collections.sort(). I have generated an interface,
> DistanceMeasure, which has only a method calculateDistance(). This
> interface, currently, is implemented in the EulcideanDistance class. I
> have not posted this code, and need to finish the unit tests.
> Thanks,
> John
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message