mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Söhngen <tho...@beluto.com>
Subject Using several Mahout JarSteps in a JobFlow
Date Tue, 08 Feb 2011 16:34:35 GMT
Hi fellow data crunchers,

I am running a JobFlow with a step using 
"org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob" 
and a following step using 
"org.apache.mahout.cf.taste.hadoop.item.RecommenderJob". The first step 
works without problems, but the second one is throwing an Exception:

|Exception in thread"main"  org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
temp/itemIDIndex already exists and is not empty
	at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:124)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:818)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
	at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:165)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:328)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

|

It looks like the second job is using the same temporal output 
directories as the first job. How can I avoid this? Or even better: If 
some of the tasks are already done and cached in the first step, how 
could I use them so that they don't have to be recomputed in the second 
step?

Best regards,
Thomas

PS: This is the actual JobFlow definition in JSON:

[
    [......],
   {
     "Name": "MR Step 2: Find similiar items",
     "HadoopJarStep": {
       "MainClass": 
"org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob",
       "Jar": "s3n://recommendertest/mahout-core/mahout-core-0.4-job.jar",
       "Args": [
          "--input",     
"s3n://recommendertest/data/<jobid>/aggregateWatched/",
          "--output",    
"s3n://recommendertest/data/<jobid>/similiarItems/",
          "--similarityClassname",    "SIMILARITY_PEARSON_CORRELATION",
          "--maxSimilaritiesPerItem",    "100"
       ]
     }
   },
   {
     "Name": "MR Step 3: Find items for user",
     "HadoopJarStep": {
       "MainClass": "org.apache.mahout.cf.taste.hadoop.item.RecommenderJob",
       "Jar": "s3n://recommendertest/mahout-core/mahout-core-0.4-job.jar",
       "Args": [
          "--input",     
"s3n://recommendertest/data/<jobid>/aggregateWatched/",
          "--output",    
"s3n://recommendertest/data/<jobid>/userRecommendations/",
          "--similarityClassname",    "SIMILARITY_PEARSON_CORRELATION",
          "--numRecommendations",    "100"
       ]
     }
   }
]

||||


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message