mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject Re: mahout job and hadoop
Date Wed, 16 May 2012 18:38:36 GMT
On Wed, May 16, 2012 at 3:00 AM, Chandra Mohan, Ananda Vel Murugan
<> wrote:
> *       What is the difference between running a mahout job locally and
> in Hadoop?

Mostly the difference is whether algorithm supports running on
MapReduce or not. Usually it is one way or the other.  (although
MapReduce based solutions could be run using hadoop local mode in some
cases (not all) and technically it would still be "running in Hadoop".

> *       I wrote a simple mahout job to do K-means clustering using my
> data. I packaged it as jar and tried running it. It worked and did the
> clustering in a Hadoop single node cluster. I am planning to move this
> job to a multi node cluster.  Should I execute mahout command from job
> tracker node only? Or can I execute it from any node in cluster and be
> assured that it uses all the nodes in the cluster. How mahout works in a
> multi node cluster?

You can execute command line (it's called "driver" in Hadoop's lingua)
from any node that has a network connectivity to mapreduce cluster
(i.e. you don't have to choose any particular node or even be within
the cluster) but you should do it only once.


View raw message