mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eric skinner <ericfrankskin...@gmail.com>
Subject Is this a bug or a setup issue for using NewsKMeasnClustering.java
Date Tue, 09 Aug 2011 17:07:50 GMT
Hello,

I am practicing the NewsKMeansClustering.java, an example code given in
chapter 9 of Mahout-in-Action? I run this program against a directory of
sequence files. The output error message is as follows:

Exception in thread "main" java.io.FileNotFoundException:* File
newsClusters/clustersclusteredPoints/part-m-00000 does not exist*.
 at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
 at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)

at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:676)
 at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
 at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412)
at
mia.clustering.ch09.NewsKMeansClustering.main(NewsKMeansClustering.java:76)

As reference, the directory structure of the result generated after running
this program is shown as follows as well:

~/workspaceMahout1/recommender/newsClusters% ls
 canopy-centroids clusters df-count dictionary.file-0 frequency.file-0
tfidf-vectors tf-vectors tokenized-documents wordcount
 ~/workspaceMahout1/recommender/newsClusters/clusters/clusteredPoints% ls
part-m-00000

Afterwards, I change the code from the original one

new Path(clusterOutput+Cluster.CLUSTERED_POINTS_DIR +”/part-m-00000”), conf);


to

*new Path(clusterOutput+”/clusteredPoints”+”/part-m-00000”), conf);*


The program can go through without giving the above error messages. I would
like to know is that a bug in the original code or are there any other
hidden issues?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message