mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Söhngen <tho...@beluto.com>
Subject Problems with Recommendations on Amazon EMR
Date Mon, 31 Jan 2011 17:14:03 GMT
Hello fellow Mahout users,

I have strange issues running Mahout on top of Amazons Elastic 
MapReduce. I wrote a python script using the boto library (see 
http://pastebin.com/UxKjmRF2 for the script ). I define and run a step 
like this:

    [...]
    step2 = JarStep(name='Find similiar items',
                     jar='s3n://'+ main_bucket_name
    +'/mahout-core/mahout-core-0.4-job.jar',
                    
    main_class='org.apache.mahout.cf.taste.hadoop.item.RecommenderJob',
                     step_args=['--input s3n://'+ main_bucket_name
    +'/data/' + run_id + '/aggregateWatched/',
                                '--output s3n://'+ main_bucket_name
    +'/data/' + run_id + '/similiarItems/',
                                '--similarityClassname
    SIMILARITY_PEARSON_CORRELATION'
                               ])
    [...]
    jobid = emr_conn.run_jobflow(name = name,
                              log_uri = 's3n://'+ main_bucket_name
    +'/emr-logging/',
                              enable_debugging=1,
                              hadoop_version='0.20',
                              steps=[step1,step2])


The controller for the step gives me the following response:

    2011-01-31T16:07:34.068Z INFO Fetching jar file.
    2011-01-31T16:07:57.862Z INFO Working dir /mnt/var/lib/hadoop/steps/3
    2011-01-31T16:07:57.862Z INFO Executing /usr/lib/jvm/java-6-sun/bin/java -cp /home/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-0.20-core.jar:/home/hadoop/hadoop-0.20-tools.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/*
-Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/3 -Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop
-Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/3/tmp
-Djava.library.path=/home/hadoop/lib/native/Linux-i386-32 org.apache.hadoop.util.RunJar /mnt/var/lib/hadoop/steps/3/mahout-core-0.4-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob --input s3n://recommendertest/data/job2011Y01M31D17H01M52S/aggregateWatched/
--output s3n://recommendertest/data/job2011Y01M31D17H01M52S/similiarItems/ --similarityClassname
SIMILARITY_PEARSON_CORRELATION
    2011-01-31T16:08:01.880Z INFO Execution ended with ret val 0
    2011-01-31T16:08:04.055Z INFO Step created jobs:
    2011-01-31T16:08:04.055Z INFO Step succeeded

But the syslog tells me:

    2011-01-31 16:08:00,631 ERROR org.apache.mahout.common.AbstractJob
    (main): Unexpected --input
    s3n://recommendertest/data/job2011Y01M31D17H01M52S/aggregateWatched/
    while processing Job-Specific Options:

...producing no output at all, not even the directory.

Next I try to run the jar as a single JobFlow from the AWS console. This 
is the controller output:

    2011-01-31T16:33:57.030Z INFO Fetching jar file.
    2011-01-31T16:34:19.520Z INFO Working dir /mnt/var/lib/hadoop/steps/2
    2011-01-31T16:34:19.521Z INFO Executing /usr/lib/jvm/java-6-sun/bin/java -cp /home/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-0.20-core.jar:/home/hadoop/hadoop-0.20-tools.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/*
-Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/2 -Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop
-Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/2/tmp
-Djava.library.path=/home/hadoop/lib/native/Linux-i386-32 org.apache.hadoop.util.RunJar /mnt/var/lib/hadoop/steps/2/mahout-core-0.4-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob --input s3n://recommendertest/data/job2011Y01M31D17H01M52S/aggregateWatched/
--output s3n://recommendertest/data/job2011Y01M31D17H01M52S/similiarItems/ --similarityClassname
SIMILARITY_PEARSON_CORRELATION
    2011-01-31T16:47:22.477Z INFO Execution ended with ret val 0
    2011-01-31T16:47:24.616Z INFO Step created jobs: job_201101311631_0001,job_201101311631_0002,job_201101311631_0003,job_201101311631_0004,job_201101311631_0005,job_201101311631_0006,job_201101311631_0007,job_201101311631_0008,job_201101311631_0009,job_201101311631_0010,job_201101311631_0011
    2011-01-31T16:47:47.642Z INFO Step succeeded

As you can see, the execution (line 3) looks exactly the same(except for 
the step being step 3 in the first and step 2 in the second case), but 
this time the steps within the jar are executed and the syslog shows the 
progress of the map and reduce steps (see http://pastebin.com/Ezn3nGb4 
). The output directory is created, and there is a file in it, but with 
no content at all (the filesize is 0 bytes). So although the JobFlow 
runs for about 16 minutes on this and the logs clearly show, that there 
is data processed, the output is zero.

These errors are giving me headaches for some days now, I would really 
appreciate if someone could give me a glue on this. I made the s3n 
folder public, if it helps: 
s3n://recommendertest/data/job2011Y01M31D17H01M52S/

Thanks in advance,
Thomas Söhngen










Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message