mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Eastman (JIRA)" <>
Subject [jira] Updated: (MAHOUT-294) Uniform API behavior for Jobs
Date Fri, 16 Jul 2010 17:46:50 GMT


Jeff Eastman updated MAHOUT-294:

    Attachment: MAHOUT-294a.patch

Here's a stab at improving the testability of AbstractJob options parsing. It adds an argMap
variable in AbstractJob and adds new getOption() and hasOption() methods which encapsulate
the "--" prepending, avoiding additional constants. By factoring out ClusterDumper.addOptions()
as a public method it allows unit testing of the command line processing without invoking
the cluster dumper. We could require this in all subclasses by adding and
calling a new abstract addOptions() from it. That will have broad impact on all drivers and
I have not done it in this patch.

As a further step, one could imagine moving all of the common options from DefaultOptionCreator
into AbstractJob. This would have all of the Mahout shared command line options in a single
place; improving consistency.

Comments on this approach are welcome. I'm gone for the weekend.

> Uniform API behavior for Jobs
> -----------------------------
>                 Key: MAHOUT-294
>                 URL:
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering, Collaborative Filtering, Frequent Itemset/Association
Rule Mining, Genetic Algorithms, Math, Utils
>    Affects Versions: 0.4
>            Reporter: Robin Anil
>             Fix For: 0.4
>         Attachments: MAHOUT-294.patch, MAHOUT-294.patch, MAHOUT-294a.patch
> * Move AbstractJob to common and convert all the Driver classes to extend that.
>    One suggestion is:
>    AlgorithmParams params ="-i", input).withParam("-o",
>    MyAlgorithmn.runJob(params) throws ParameterMissingException;
> * Give uniform command-line parameters for various algorithms.
>    e.g Currently distance measure is -d, -dm, -m at different places in clustering
> * Add a temp directory as a parameter
> This issue will keep track of all discussion/patches related to the design and cleanup
of Mahout API

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message