mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niko Gamulin <niko.gamu...@gmail.com>
Subject Problems with K-Means Spectral Clustering on EMR
Date Sun, 26 Oct 2014 21:19:21 GMT
Hi,

I tried to run Spectral clustering example from mahout website on EMR.

I uploaded to the bucket the following files:
affinity.txt (affinity matrix)
mahout-core-0.9-job.jar
mahout-core-0.9.jar
update-lucene.sh
lucene-4.3.0.tgz

The update-lucene.sh contains the following:

#!/bin/bash
cd /home/hadoop
wget https://s3.amazonaws.com/hellomahout/lucene-4.3.0.tgz
tar -xzf lucene-4.3.0.tgz
cd lib
rm lucene-*.jar
cd ..
cd lucene-4.3.0
find . | grep lucene- | grep jar$ | xargs -I {} cp {} ../lib

The Cluster configuration is the following:

Hadoop Distribution: Amazon, AMI version: 3.2.1

EC" instance types:
Master: m1.large, 1
Core: m1.large, 1
Task: None (m1.medium,1)

Bootstrap Actions:
Custom action, S3 location: s3://hellomahout/update-lucene.sh

Steps:

Custom JAR, JAR location: s3://hellomahout/mahout-core-0.9-job.jar,
Arguments: org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver
--input s3://hellomahout/testdata/affinity.txt --output
s3://hellomahout/testdata/results -d 3 -k 2 -x 10

When I try to run it, I get the following exception:

Exception in thread "main" java.io.FileNotFoundException: No such file
or directory 'hdfs://172.31.1.27:9000/user/hadoop/temp/calculations/unitvectors'
    at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:759)
    at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:507)
    at org.apache.mahout.clustering.kmeans.EigenSeedGenerator.buildFromEigens(EigenSeedGenerator.java:67)
    at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:243)
    at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:127)
    at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:118)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.main(SpectralKMeansDriver.java:70)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



Does anyone know what causes the exception?
Could anyone provide any suggestions about how to run spectral clustering
on EMR?

Thank you.

Niko

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message