mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <>
Subject TODO (was: Confluence Wiki)
Date Sat, 26 Jan 2008 07:16:15 GMT
26 jan 2008 kl. 04.23 skrev Grant Ingersoll:

> let's try to start filling in the TODO list and get some basic build  
> infrastructure in place.

We should talk about a unison data access API. No need for something  
fancy or speedy from the start, a seekable record reader might be  
enough for now. Lots of abstract layers to allow people adding support  
methods and use of any data source with optional levels of access  
optimization. An ARFF, an inverted index or what ever fits best with  
the algortihm you are about to pass the data to.

> Does anyone feel particularly strong about initial algorithms to  
> tackle?  I'm thinking k-Means, naive bayes or neural nets, but am  
> obviously open to other suggestions.

I'm planning a soft start implementing pre processing filters  
(discretization, resampling, etc). Then I'll probably look at feature  
selection, heirarchial clustering or reinforcement learning.


View raw message