mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammed Omer <beancinemat...@gmail.com>
Subject Difficulties adding a custom job (analyzer) to Hadoop
Date Thu, 07 Aug 2014 20:38:05 GMT
All,

I'm having a tough time adding a custom analyzer to Hadoop and making use
of it through Mahout.

I've pruned down the Mahout in Action examples to a sole example which is a
customized Mahout 0.9 MailArchivesClusteringAnalyzer in
https://github.com/momer/MiA/blob/mahout-0.9/src/main/java/mia/clustering/ch09/MoAnalyzer.java

After updating the pom.xml to use Mahout 0.9, running `mvn package` and
moving the `mia-0.7-job.jar` to $HADOOP_HOME/lib, I run into a few issues:

First, I'm unsure how to remove the duplication of dependencies on SLF4J
from the job.jar, and,

Secondly, Hadoop is unable to find the Mahout classes when I'm using my
custom job jar.

Relevant stack traces are available at
https://gist.github.com/momer/52e1e7d2dd7612b26909

I'm admittedly pretty new to Hadoop/Mahout, and would really appreciate any
pointers in the right direction. Pretty much just need to get that Porter
Stemming step out of the analyzer!

Thank you much all for maintaining and keeping the mailing list alive,

Mo

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message