mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <>
Subject Re: Goals for Mahout 0.7
Date Sun, 12 Feb 2012 08:02:33 GMT
+ users@

These are great ideas, and are just the kinds of high level 
conversations I was hoping to engender. From my agile background, I'd 
hope to define 0.7 by a small number of "epic stories", in a subset of 
our overall capabilities, which could focus our attention to a set of 
derivative JIRAs  that will give Mahout a quantum step forward in some 
functional area from our user's perspective. I think maybe 2-3 such 
"epics" are all we can handle in a release. I don't necessarily think 
mine are the right ones either, but are prime for the pump.

If we could only do 2-3 epics, what would they be? Where would the 
biggest contributions lie?

On 2/11/12 9:45 PM, Lance Norskog wrote:
> For incremental improvements, usability and correctness of algorithms.
> The "new" Naive Bayes and SGD algorithms both seem to have trouble
> classifying. Also, interpretation of results. It is hard to summarize
> the quality of results. I often feel like the math-savvy implementors
> print a bunch of numbers and say "that looks right", and the rest of
> us struggle to get an intuition of what's going on and why.
> For new features, "Mahout Online" would be great: a web service that
> packages all of the "online" algorithms (tractable speed and memory
> use).
> On Sat, Feb 11, 2012 at 1:29 PM, Frank Scholten<>  wrote:
>> I'd like to add solving ClassNotFoundException problems with third
>> party jars in some jobs.
>> I experimented with having seq2sparse uploading a third party jar with
>> analyzer and add it to the DistributedCache. Uploading works but
>> didn't yet get it working inside the Mappers. I have some code lying
>> around for this that can be used as a starting point, including a
>> separate project that has dependencies on Mahout and on an analyzer to
>> test things out.
>> Another thing would be adding or improving the integration tools. For
>> example adding a mysql2seq to cluster text from a SQL database.
>> On Sat, Feb 11, 2012 at 8:01 PM, Jeff Eastman
>> <>  wrote:
>>> Now that 0.6 is in the box, it seems a good time to start thinking about
>>> 0.7, from a high level goal perspective at least. Here are a couple that
>>> come to mind:
>>> Target code freeze date August 1, 2012
>>> Get Jenkins working for us again
>>> Complete clustering refactoring and classification convergence
>> What kind of clustering refactoring do mean here? I did some work on
>> creating bean configurations in the past (MAHOUT-612). I
>> underestimated the amount of work required to do the entire
>> refactoring. If this can be contributed and committed on a per-job
>> basis I would like to help out.
>>> ...

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message