mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Podolski <robpodol...@yahoo.co.uk>
Subject CanopyClusterer makes output folder OK then crashes and tell's me "Mkdirs Failed To Create (same) Output Folder"
Date Wed, 29 Jan 2014 22:07:36 GMT
Hi

I am trying out the canopy-clustering driver from Java using Mahout-0.8 and am getting a very
odd error.

java.io.IOException: Mkdirs failed to create /test_clustering_output/clusters-0-final
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:378)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:364)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:564)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:896)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:884)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:876)
        at org.apache.mahout.clustering.classify.ClusterClassifier.writePolicy(ClusterClassifier.java:234)
        at org.apache.mahout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:373)
        at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:157)
        at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:168)
        at service.clustering.algorithms.CanopyClusterer.cluster(Unknown Source)
        at service.clustering.ClusterRunner.doClustering(Unknown Source)
        at test.service.NonJunitClustererTest.testClustering(Unknown Source)
        at test.service.NonJunitClustererTest.main(Unknown Source)
Clustering failed: Mkdirs failed to create /test_clustering_output/clusters-0-final

Contrary to the message, the output folder /test_clustering_output/clusters-0-final HAS BEEN
CREATED. If I do...

"hadoop fs -ls /test_clustering_output/clusters-0-final" I get...

Warning: $HADOOP_HOME is deprecated.
Found 3 items
-rw-r--r--   1 rob supergroup          0 2014-01-29 21:33 /test_clustering_output/clusters-0-final/_SUCCESS
drwxr-xr-x   - rob supergroup          0 2014-01-29 21:32 /test_clustering_output/clusters-0-final/_logs
-rw-r--r--   1 rob supergroup        106 2014-01-29 21:33 /test_clustering_output/clusters-0-final/part-r-00000

---
I am running on a single node hadoop cluster on AWS/Ubuntu and I'm trying to run the driver
from Java...

Configuration hfsConf = new Configuration();
hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/core-site.xml"));
hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/hdfs-site.xml"));
hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/mapred-site.xml"));
try {
    CanopyDriver.run(
        hfsConf,     // HS file system configuration 
        new Path(hadoopInputSequenceFile),   // Input sequence file of geovectors
        new Path(hadoopOutputFile),  // Output file
        dm,                          // Distance measure
        t1,                         // Canopy T1 radius
        t2,                            // Canopy T2 radius 
        true,                         // true to cluster the input
vectors
        0.0,                         // vectors having pdf below this
value will not be clustered. Its value should be between 0 and 1
        false);                        //  execute sequentially if
true
    return true;
} catch (Exception e) {
    e.printStackTrace();
} 

Any help would be most appreciated.  I have tried almost everything I can think of including
switching off permissions in the hadoop config,  ensuring that my hadoop.tmp folder has open
permissions. Only remaining hunches are (a) perhaps the Configuration object does not have
enough information (b) I am adding one or two separate jars to the HADOOP_CLASSPATH instead
of trying to add all to the mahout-job jar. 

Rob
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message