mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Farris <d...@apache.org>
Subject Re: Problems running examples
Date Sun, 19 Jun 2011 23:04:18 GMT
Jeff,

The key bit from your output is this:

+ exec /Users/jeff/hadoop/hadoop-0.20.2/bin/hadoop jar
/Users/jeff/Documents/workspace/mahout/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar
org.apache.mahout.driver.MahoutDriver seqdirectory -i
mahout-work/reuters-out -o mahout-work/reuters-out-seqdir -c UTF-8
-chunk 5

This shows me that the mahout driver script is using hadoop to run the
seqdirectory task, which will cause it to treat the directories
specified by -i and -o as hdfs paths. The mystery to me is how there's
no mention of MAHOUT_LOCAL getting set. Can you confirm that the
following line appears in your copy of build-reuters.sh (that you're
running on the mac?)

    MAHOUT_LOCAL=true $MAHOUT seqdirectory \
        -i mahout-work/reuters-out \
        -o mahout-work/reuters-out-seqdir \
        -c UTF-8 -chunk 5

I'm getting hadoop/mahout installed on my mac presently to determine
if this is some sort of mac shell issue.

Ian,

Thanks for the exact syntax to get the jars from the hadoop
installation on disk referenced in the script. Adding the jars to the
classpath makes sense, and sort of confirms that it may be a class
compatibility problem. One thing I noticed is that  the diff you
provided will add the jars from the hadoop install at the end of the
classpath. Perhaps they should go at the beginning of the path instead
of the end so that the hadoop jars from the installation are always
used first?

So, instead of:

for f in "$HADOOP_HOME"/hadoop-*.jar; do
       CLASSPATH=${CLASSPATH}:$f
done

The following might work better:

for f in "$HADOOP_HOME"/hadoop-*.jar; do
       CLASSPATH=$f:${CLASSPATH}
done

(Same goes for the second classpath loop/etc)

Drew

Mime
View raw message