mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahmood Naderan <>
Subject Re: Question about Mahout/Hadoop
Date Sat, 29 Mar 2014 06:15:40 GMT
>From the script I see that mahout finally runs bin/hadoop and hadoop runs the java command.
Basically I want to know more about data structures (tree, list, vector, array, ...) or data


On Saturday, March 29, 2014 12:54 AM, Chandler Burgess <> wrote:

What are you trying to get at with your question? Does the answer affect something in your

If you have the environment variable MAHOUT_LOCAL set, trainnb still runs MapReduce jobs but
it runs, basically, Hadoop in memory (from what I can tell anyways). If you don't have that
variable set, then the job gets submitted to your Hadoop environment (if the Hadoop environment
variables are properly configured).

If you are just getting started and playing around, I would recommend setting MAHOUT_LOCAL,
e.g. export MAHOUT_LOCAL=1. I'm a beginner myself but have done a lot of playing around with
naïve bayes locally, using datasets up to 400k documents to test with training sets up to
30k documents, and it runs very fast.

-----Original Message-----
From: Andrew Musselman [] 
Sent: Friday, March 28, 2014 2:57 PM
Subject: Re: Question about Mahout/Hadoop

You're running a bash script that lives at $MAHOUT_HOME/bin/mahout.

If you read through that script you can start to follow what goes on when you run the command
starting with `mahout`.  See at the bottom of the script where the `exec` commands are; that's
where things start to be executed.

On Fri, Mar 28, 2014 at 12:34 PM, Mahmood Naderan <>wrote:

> Hi
> I want to know then I run a command like
>     mahout trainnb -i .... -o ...
> , am I running a mahout code or hadoop?
> In other words, which one is dominant?
> Regards,
> Mahmood
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message