mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Composing Mahout workflow (Re: Improving Our JIRA State)
Date Thu, 27 Oct 2011 05:16:37 GMT
What about Groovy? Java does have scripting languages built in. Someone
(sorry can't remember) has some patches to make Mahout scala-friendly.

A use case for "programmable workflow engine" is to run the same
classification job 100 times with different tuning parameters, and save the
confusion matrices for further optimization.  Which of these tools allows
this?

Lance

On Wed, Oct 26, 2011 at 7:59 PM, Drew Farris <drew@apache.org> wrote:

> (Also a separate topic here)
>
> On Wed, Oct 26, 2011 at 5:19 PM, Dan Brickley <danbri@danbri.org> wrote:
> >
> > Also I've been thinking in very fuzzy terms about how to compose
> > larger tasks from smaller pieces, and wondering what might be a more
> > principled way of doing this than running each bin/mahout job by hand.
> > Obviously coding it up is one way, but also little shell scripts or
> > makefiles or (if forced at gunpoint) maybe Ant ...?
>
> Well, there certainly seem to be a number of options out there, don't
> forget to mention the FlumeJava items like Ted's work on Plume or
> Cloudera Crunch. Is Oozie is an option for this as well? When I was
> looking at the clustering code recently and saw the various, methods
> starting with the run* prefix, I really wondered if there was a
> standard way that we could package these chunks of code (steps), that
> would allow them to be easily decomposed and re-combined in different
> ways.
>
> There's some talk about beanifying our workflow steps in
> https://issues.apache.org/jira/browse/MAHOUT-612, but I can't say I
> understand how this would allow us to reach the composable workflow
> goal.
>



-- 
Lance Norskog
goksron@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message