mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Using several Mahout JarSteps in a JobFlow
Date Tue, 08 Feb 2011 16:37:03 GMT
I would not run them in the same root directory / key prefix. Put them
both under different namespaces.

On Tue, Feb 8, 2011 at 4:34 PM, Thomas Söhngen <thomas@beluto.com> wrote:
> Hi fellow data crunchers,
>
> I am running a JobFlow with a step using
> "org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob" and a
> following step using
> "org.apache.mahout.cf.taste.hadoop.item.RecommenderJob". The first step
> works without problems, but the second one is throwing an Exception:
>
> |Exception in thread"main"
>  org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> temp/itemIDIndex already exists and is not empty
>        at
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:124)
>        at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:818)
>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
>        at
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:165)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:328)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> |
>
> It looks like the second job is using the same temporal output directories
> as the first job. How can I avoid this? Or even better: If some of the tasks
> are already done and cached in the first step, how could I use them so that
> they don't have to be recomputed in the second step?
>
> Best regards,
> Thomas
>
> PS: This is the actual JobFlow definition in JSON:
>
> [
>   [......],
>  {
>    "Name": "MR Step 2: Find similiar items",
>    "HadoopJarStep": {
>      "MainClass":
> "org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob",
>      "Jar": "s3n://recommendertest/mahout-core/mahout-core-0.4-job.jar",
>      "Args": [
>         "--input",
> "s3n://recommendertest/data/<jobid>/aggregateWatched/",
>         "--output",    "s3n://recommendertest/data/<jobid>/similiarItems/",
>         "--similarityClassname",    "SIMILARITY_PEARSON_CORRELATION",
>         "--maxSimilaritiesPerItem",    "100"
>      ]
>    }
>  },
>  {
>    "Name": "MR Step 3: Find items for user",
>    "HadoopJarStep": {
>      "MainClass": "org.apache.mahout.cf.taste.hadoop.item.RecommenderJob",
>      "Jar": "s3n://recommendertest/mahout-core/mahout-core-0.4-job.jar",
>      "Args": [
>         "--input",
> "s3n://recommendertest/data/<jobid>/aggregateWatched/",
>         "--output",
>  "s3n://recommendertest/data/<jobid>/userRecommendations/",
>         "--similarityClassname",    "SIMILARITY_PEARSON_CORRELATION",
>         "--numRecommendations",    "100"
>      ]
>    }
>  }
> ]
>
> ||||
>
>

Mime
View raw message