Running the datassetcreator on the full wikipedia set:
bin/mahout wikipediaDataSetCreator -i wiki -o
../datasets/wikipediainput -c examples/src/test/resources/country.txt
After some time in I got this error and the job quit. It left no output files.
Is this a hiccup, a Hadoop error, or something wrong in Mahout?
----------------------------
11/02/01 01:44:52 INFO bayes.WikipediaDatasetCreatorMapper: Configure:
Input Categories size: 229 Exact Match: false Analyzer:
org.apache.mahout.analysis.WikipediaAnalyzer
11/02/01 01:44:52 INFO mapred.MapTask: Starting flush of map output
11/02/01 01:44:52 INFO mapred.MapTask: Finished spill 0
11/02/01 01:44:52 INFO mapred.TaskRunner:
Task:attempt_local_0001_m_028511_0 is done. And is in the process of
commiting
11/02/01 01:44:52 INFO mapred.LocalJobRunner:
11/02/01 01:44:52 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_028511_0' done.
11/02/01 01:45:18 WARN mapred.LocalJobRunner: job_local_0001
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/file.out
in any of the configured local directories
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
at org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:50)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:193)
11/02/01 01:45:19 INFO mapred.JobClient: Job complete: job_local_0001
11/02/01 01:45:19 INFO mapred.JobClient: Counters: 8
11/02/01 01:45:19 INFO mapred.JobClient: FileSystemCounters
11/02/01 01:45:19 INFO mapred.JobClient: FILE_BYTES_READ=435709583455348
11/02/01 01:45:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=72839164345155
11/02/01 01:45:19 INFO mapred.JobClient: Map-Reduce Framework
11/02/01 01:45:19 INFO mapred.JobClient: Combine output records=0
11/02/01 01:45:19 INFO mapred.JobClient: Map input records=10860674
11/02/01 01:45:19 INFO mapred.JobClient: Spilled Records=1164848
11/02/01 01:45:19 INFO mapred.JobClient: Map output bytes=4282654947
11/02/01 01:45:19 INFO mapred.JobClient: Combine input records=0
11/02/01 01:45:19 INFO mapred.JobClient: Map output records=1164848
11/02/01 01:45:19 INFO driver.MahoutDriver: Program took 12692646 ms
--
Lance Norskog
goksron@gmail.com
|