commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Gant <>
Subject Re: [math] Re: commons math
Date Sat, 13 Aug 2005 23:29:59 GMT

- Feature reduction
a. Basic cross correlation, including both spearman and pearson cross
correlation algorithms.
b. Principal Component Analysis.
c. Entropy Based reduction. 

I currently have a, and b finished but need to brush up on my junit skills :)

-Difference Measures
I had in mind a difference engine, basically an engine that handles
all difference operations. This difference engine could, in the
constructor or using set methods, take an instance of one of the
following difference methods.

a. euclidean distance
b. city-block distance

-Pattern Discovery
a. KMotif Discovery Algorithm.

Again I have this algorithm completed, just need to boundary test everything.

-Clustering Algorithms
a. K-means Algorithm.

I'd like to discuss the architecture of the k-means, I have a few
ideas and would like a little feedback. I know this is just a small
subset of the available algorithms, but this seems to be a good start.


On 8/13/05, Phil Steitz <> wrote:
> John,
> Sounds great!  Extending the stat package to include some data mining
> capabilities would be a good and useful addition to commons-math,
> IMHO.  To get started, the first thing to do is to read the
> developer's guide
> (, which will
> tell point you to the general apache references and go over some IP
> stuff that we have to worry about in [math].
> Then either here or on the Wiki (see the guide for a link), post a
> brief description of the kinds of mining algorithms that you are
> interested in developing and we can get this going. On this list, pls
> begin the subject line of all [math] messages with [math].
> Thanks in advance for your contributions!
> Phil
> On 8/13/05, John Gant <> wrote:
> > Hello,
> > I am currently a graduate student in Computer Science and Computer
> > Engineering at the University of Louisville, Kentucky. First let me
> > congratulate the group of developers who commit and architect for
> > apache commons. I have used many of the libraries and they are all of
> > excellent quality (but I guess you already know that :)). I am
> > interested in contributing to open source software and have interests
> > that are in the domain of statistics with a focus in data mining.
> > After writing many algorithms for classes, and asking an apache
> > contributor if any of this would be needed elsewhere, he told me to
> > purpose something to the dev list. So here it goes, I would like to
> > help start a data mining section of commons math and advance the
> > existing statistical libraries. I plan on developing the algorithms
> > for personal use anyway, and would like to see some of my work be used
> > by others. If anyone is interested we can continue this thread and I
> > will email my code, and purpose my new algorithms.
> >
> >
> > John Gant
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > For additional commands, e-mail:
> >
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

John Gant

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message