whirr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Frank Scholten (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (WHIRR-384) Add Mahout as a service
Date Tue, 13 Sep 2011 20:45:08 GMT

    [ https://issues.apache.org/jira/browse/WHIRR-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103965#comment-13103965

Frank Scholten commented on WHIRR-384:

Added new patch with 'mahout-client' role and without the unneeded dependencies.

At the moment the 'mahout-client' role is oblivious to Hadoop. It unpacks the tarball and
adds the mahout script to the path. The mahout script does have some checks, it looks for
configuration in $HADOOP_HOME/conf but you still need to setup a Hadoop cluster.

Before this patch I would point HADOOP_CONF_DIR to the Hadoop configuration generated by Whirr
on my local machine and run jobs from there. I guess if Whirr could generate this config on
another node under $HADOOP_HOME/conf and you give this node the 'mahout-client' you can submit
mahout jobs from that node in the same way. The role does not have to be added to a namenode,
the node just needs Hadoop configuration.

About the 'mahout-jar' role, my idea was to create a cluster with the Mahout jar on tasktracker
nodes so you could run a Mahout job from a Java process that has compile dependencies on Mahout
without having to build a job jar that contains Mahout and its dependencies. I would like
to be able to set up a Java project with dependencies on Whirr, Mahout and Hadoop and launch
jobs from Java without building a job jar. However, if you this is problematic or not a good
idea let me know. 

> Add Mahout as a service
> -----------------------
>                 Key: WHIRR-384
>                 URL: https://issues.apache.org/jira/browse/WHIRR-384
>             Project: Whirr
>          Issue Type: New Feature
>          Components: new service
>    Affects Versions: 0.7.0
>            Reporter: Frank Scholten
>             Fix For: 0.7.0
>         Attachments: WHIRR-384-mahout-client.patch, WHIRR-384-mahout-home.patch
> Here is an initial patch to support Mahout as a Whirr service.
> I created the role 'mahout-home' which can be used to install the binary Mahout distribution
on a Hadoop namenode.
> By combining this role with configuration for a Hadoop cluster you can SSH into the namenode,
su to root and start running Mahout jobs via the mahout script immediately.
> The 'mahout-home' role has two properties
> Mahout version					whirr.mahout.version 
> URL of the Mahout binary distribution tarball	whirr.mahout.tarball.url
> Note that I used a snapshot version of Mahout for testing, revision 1169784, because
there were some problems with the Mahout script in 0.5 that have been fixed on trunk, see
MAHOUT-680. To test you can set the tarball property to this link http://dl.dropbox.com/u/13436484/mahout-distribution-0.6-SNAPSHOT.tar.gz
> I used configure actions and the onBeforeConfigure(). If there is a better way to express
this with the Whirr API let me know.
> Currently I am investigating a 'mahout-jar' role, which installs the Mahout examples
job jar under $HADOOP_HOME/lib on a tasktracer node. I already have some code for putting
the jar in place but when running a job from my local machine I still get ClassNotFoundExceptions.
I believe this is because Hadoop has already started before the jar is put in the lib dir,
so the jar won't be picked up, but I have to investigate some more. From WHIRR-221 I understood
that there is no support (yet?) for ordering of services but if you have an idea on how to
fix this let me know.
> Comments and suggestions welcome!

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message