mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robin Anil (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms
Date Fri, 05 Feb 2010 12:05:28 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830077#action_12830077
] 

Robin Anil commented on MAHOUT-185:
-----------------------------------

I like the script as i am running k-means these days :)
{code}
if [ "$COMMAND" = "vectordump" ] ; then
  CLASS=org.apache.mahout.utils.vectors.VectorDumper
elif [ "$COMMAND" = "clusterdump" ] ; then
  CLASS=org.apache.mahout.utils.clustering.ClusterDumper
elif [ "$COMMAND" = "seqdump" ] ; then
  CLASS=org.apache.mahout.utils.SequenceFileDumper
elif [ "$COMMAND" = "kmeans" ] ; then
  CLASS=org.apache.mahout.clustering.kmeans.KMeansDriver
elif [ "$COMMAND" = "canopy" ] ; then
  CLASS=org.apache.mahout.clustering.canopy.CanopyDriver
elif [ "$COMMAND" = "lucenevector" ]; then
  CLASS=org.apache.mahout.utils.vectors.lucene.Driver
elif [ "$COMMAND" = "seqdirectory" ]; then
  CLASS=org.apache.mahout.text.SequenceFilesFromDirectory
elif [ "$COMMAND" = "seqwiki" ]; then
  CLASS=org.apache.mahout.text.WikipediaToSequenceFile
{code}

If we go like this we might have too many options. Any way to streamline this ?

One thought i have is to have package level Main classes in Core like org.apache.mahout.Clustering.java
which internally calls the different main functions ?
Similarly in examples and util we can keep One Entry class each Examples.java and Util.java

So with this limited set we can keep a global conf object which implements Tool and the fs
object which is the default filesystem as specified by the conf
This way each algorithm can request a conf object (which copies everything Tool has set)
How does that sound? I can whip up all the main classes tonight











> Add mahout shell script for easy launching of various algorithms
> ----------------------------------------------------------------
>
>                 Key: MAHOUT-185
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-185
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.2
>         Environment: linux, bash
>            Reporter: Robin Anil
>             Fix For: 0.3
>
>         Attachments: MAHOUT-185.patch
>
>
> Currently, Each algorithm has a different point of entry. At its too complicated to understand
and launch each one.  A mahout shell script needs to be made in the bin directory which does
something like the following
> mahout classify -algorithm bayes [OPTIONS]
> mahout cluster -algorithm canopy  [OPTIONS]
> mahout fpm -algorithm pfpgrowth [OPTIONS]
> mahout taste -algorithm slopeone [OPTIONS] 
> mahout misc -algorithm createVectorsFromText [OPTIONS]
> mahout examples WikipediaExample

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message