mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Quiroz Hernandez, Andres" <>
Subject Running algorithms from within java program
Date Fri, 04 Feb 2011 23:36:25 GMT

I have set up a release version (pre-compiled) of Mahout 0.3 on top of a
hadoop cluster with version 0.20.2 and am able to run mahout algorithms
from the command line without a problem, e.g.:

mahout seq2sparse -i input_dir -o output_dir -wt tf -seq

However, I tried invoking the algorithms within a java program using the
MahoutDriver class in the following way:


Where args = {"seq2sparse", "-i", "input_dir", "-o", "output_dir",
"-wt", "tf", "-seq"}

This call fails with the message:

11/02/04 18:20:58 ERROR driver.MahoutDriver: MahoutDriver failed with
args: [seq2sparse, -i, input_dir, -o, output_dir, -wt, tf, -seq]

I believe that the problem is that I am not passing all of the jar
dependencies that the mahout driver class needs to run the algorithm,
and that this is taken care of by the mahout run script, but I am not
very familiar with shell scripting and cannot tell exactly how that is
taken care of. If I am correct, please let me know how I can include
those dependencies (all of which I assume are in the $MAHOUT_HOME/lib
folder), either in the arguments or otherwise. If not, please let me
know what is the correct way to start the algorithms from the code.

I also tried using the SparseVectorsFromSequenceFiles class (or any
other algorithm driver class) directly with the corresponding arguments
except for the short name (seq2sparse), and that call fails more
explicitly with a ClassNotFoundException (which is why I concluded
dependencies are the problem).

Thank you for your help,


View raw message