On Sep 14, 2009, at 12:23 PM, Tanton Gibbs wrote:
> Hi,
>
> I'd like to start working more with the mahout code, making small
> improvements here and there. I want to primarily focus on performance
> improvements and unit testing (mainly because I enjoy doing that).
> However, I'd like to improve a place that needs improvement. If you
> know of a section of code that you would like to see refactored/sped
> up/tested could you please send it to the list or to me? Or, if there
> is a wiki page on this, please point me to it and accept my apologies.
>
Testing and profiling of the clustering, classification and collab
filtering code would be very welcome. There are several open issues
in JIRA related to these (MAHOUT-165 comes to mind).
I think just running some examples at scale and reporting back results
would be great as well. You can also start by looking at https://issues.apache.org/jira/browse/MAHOUT
One idea is to take the Wikipedia examples I put up at https://www.ibm.com/developerworks/java/library/j-mahout/index.html
(I will donate the code soon) and try running them at larger scale
for Wikipedia.
|