Steve Armstrong <st...@stevearm.com>
Subject Trigger job from Java application causes ClassNotFound
Thu, 26 Jul 2012 23:18:51 GMT

I'm trying to trigger a Mahout job from inside my Java application
(running in Eclipse), and get it running on my cluster. I have a main
class that simply contains:

String[] args = new String[] { "--input", "/input/triples.csv",
"--output", "/output/vectors.txt", "--similarityClassname",
"--numRecommendations", "10000", "--tempDir", "temp/" +
System.currentTimeMillis() };
Configuration conf = new Configuration();
ToolRunner.run(conf, new RecommenderJob(), args);

If I package the whole project up in a single jar (using Maven), copy
it to the namenode, and run it with "hadoop jar project.jar" it works
fine. But if I try and run it from my dev pc in Eclipse (where all the
same dependencies are still in the classpath), and add the 3 hadoop
xml files to the classpath, it triggers hadoop jobs, but they fail
with errors like:

12/07/26 14:42:09 INFO mapred.JobClient: Task Id :
attempt_201206261211_0173_m_000001_0, Status : FAILED
Error: java.lang.ClassNotFoundException: com.google.common.primitives.Longs
        at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

What I'm trying to create is a self-contained JAR that can be run from
the command-line and launch the mahout job on the cluster. I've got
this all working with embedded pig scripts, but I can't get it working

Any help is appreciated, or advice on better ways to trigger the jobs from code.


