mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gokul Pillai <gokoolt...@gmail.com>
Subject Help with running clusterdump after running Dirichlet
Date Thu, 15 Jul 2010 21:19:09 GMT
I have Cloudera's CDH3 running on Ubuntu 10.04 version. And I have Apache
Mahout (0.40 Snapshot version from yesterday).

I was trying to get the clustering examples running based on the wiki page
https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data.
At the bottom of this page, there is a section that describes how to get the
data out and process it.
Get the data out of HDFS  3
<https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data#Footnote3>
 4
<https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data#Footnote4>
and
have a look  5
<https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data#Footnote5>

   - All example jobs use *testdata* as input and output to directory *
   output*
   - Use *bin/hadoop fs -lsr output* to view all outputs. Copy them all to
   your local machine and you can run the ClusterDumper on them.
      - Sequence files containing the original points in Vector form are in
      *output/data*
      - Computed clusters are contained in *output/clusters-i*
      - All result clustered points are placed into *output/clusteredPoints*


So I got the data out of HDFS onto my local and it looks like this:

hadoop@ubuntu:~/mahoutOutputs$ ls -l dirichlet/output/
total 32
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusteredPoints
drwxr-xr-x 2 hadoop hadoop 4096 2010-07-13 16:06 clusters-0
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-1
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-2
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-3
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-4
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-5
drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 data


However, when I ran clusterdump on this, I get the following error. Any help
on why clusterdump is complaining about a "_logs" folder would be helpful:

hadoop@ubuntu:~/mahoutOutputs$ ../mahoutsvn/trunk/bin/mahout clusterdump
--seqFileDir dirichlet/output/clusters-1 --pointsDir
dirichlet/output/clusteredPoints/ --output dumpOut
no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
Exception in thread "main" java.io.FileNotFoundException:
/home/hadoop/mahoutOutputs/dirichlet/output/clusteredPoints/_logs (Is a
directory)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:106)
    at
org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:63)
    at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:99)
    at
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:169)
    at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
    at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
    at
org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
    at
org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:323)
    at
org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:93)
    at
org.apache.mahout.utils.clustering.ClusterDumper.<init>(ClusterDumper.java:86)
    at
org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:272)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:175)

Regards
Gokul

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message