mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Ingles <p...@oobaloo.co.uk>
Subject Re: Error with KMeans example in trunk (793894)
Date Tue, 14 Jul 2009 16:00:55 GMT
That was running fully distributed (albeit on a 5-node Mac Pro  
cluster). I'm now running standalone and it works fine. When i looked  
initially the file was available, and accessible to the user that was  
submitting the job. I need to setup a more permanent cluster on 0.20  
and will try again with that.

On 14 Jul 2009, at 16:38, Grant Ingersoll wrote:

> Are you running in standalone, pseudo-distributed or fully  
> distributed mode in Hadoop?
>
> It looks like a permission error in Hadoop, but maybe we need to  
> make sure we have appropriate access.  I'm not that familiar with  
> the Hadoop permission capabilities.
>
> On Jul 14, 2009, at 10:46 AM, Paul Ingles wrote:
>
>> I'm definitely scratching my head now, although I think it's most  
>> likely some kind of dodgy configuration/setup on the cluster I'm  
>> using- if I run some of the other examples I get class loading  
>> errors for the example classes!
>>
>> I downloaded a fresh and unconfigured release of Hadoop 0.20 and a  
>> new checkout of Mahout trunk, and it compiled, tested, and ran  
>> through the kmeans example without trouble.
>>
>> If I find out what causes the problem I'll let the list know.
>>
>> Thanks,
>> Paul
>>
>> On 14 Jul 2009, at 15:01, Paul Ingles wrote:
>>
>>> Hi,
>>>
>>> The latest: I've updated to Subversion revision 793894 for trunk,  
>>> the code compiles and runs all of its tests successfully (mvn  
>>> install inside the project root/checkout dir).
>>>
>>> If I then run the kmeans example:
>>>
>>> $ hadoop jar ./examples/target/mahout-examples-0.2-SNAPSHOT.job  
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>
>>> It finishes the Iteration 0 but then errors with the following:
>>>
>>> 09/07/14 14:42:16 INFO mapred.JobClient:     Reduce input  
>>> records=449
>>> 09/07/14 14:42:16 WARN kmeans.KMeansDriver: java.io.IOException:  
>>> Cannot open filename /user/pair/output/clusters-0/_logs
>>> java.io.IOException: Cannot open filename /user/pair/output/ 
>>> clusters-0/_logs
>>> 	at org.apache.hadoop.hdfs.DFSClient 
>>> $DFSInputStream.openInfo(DFSClient.java:1444)
>>> 	at org.apache.hadoop.hdfs.DFSClient 
>>> $DFSInputStream.<init>(DFSClient.java:1435)
>>> 	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347)
>>> 	at  
>>> org 
>>> .apache 
>>> .hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java: 
>>> 178)
>>> 	at org.apache.hadoop.io.SequenceFile 
>>> $Reader.openFile(SequenceFile.java:1437)
>>> 	at org.apache.hadoop.io.SequenceFile 
>>> $Reader.<init>(SequenceFile.java:1424)
>>> 	at org.apache.hadoop.io.SequenceFile 
>>> $Reader.<init>(SequenceFile.java:1417)
>>> 	at org.apache.hadoop.io.SequenceFile 
>>> $Reader.<init>(SequenceFile.java:1412)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56)
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> 	at  
>>> sun 
>>> .reflect 
>>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> 	at  
>>> sun 
>>> .reflect 
>>> .DelegatingMethodAccessorImpl 
>>> .invoke(DelegatingMethodAccessorImpl.java:25)
>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>>>
>>> It then moves onto the Clustering phase and reports the following:
>>>
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Running Clustering
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Input: output/data  
>>> Clusters In: output/clusters-0 Out: output/points Distance:  
>>> org.apache.mahout.utils.EuclideanDistanceMeasure
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: convergence: 0.5 Input  
>>> Vectors: org.apache.mahout.matrix.SparseVector
>>> 09/07/14 14:42:16 WARN mapred.JobClient: Use GenericOptionsParser  
>>> for parsing the arguments. Applications should implement Tool for  
>>> the same.
>>> 09/07/14 14:42:16 INFO mapred.FileInputFormat: Total input paths  
>>> to process : 271
>>> 09/07/14 14:42:16 INFO mapred.JobClient: Running job:  
>>> job_200907141434_0004
>>> 09/07/14 14:42:17 INFO mapred.JobClient:  map 0% reduce 0%
>>> 09/07/14 14:42:28 INFO mapred.JobClient: Task Id :  
>>> attempt_200907141434_0004_m_000000_0, Status : FAILED
>>> java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
>>> 	at java.lang.ClassLoader.defineClass1(Native Method)
>>> 	at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
>>> 	at  
>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java: 
>>> 124)
>>> 	at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>>> 	at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
>>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>> 	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout.matrix.AbstractVector.asFormatString(AbstractVector.java: 
>>> 374)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java: 
>>> 198)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering 
>>> .kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering 
>>> .kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32)
>>> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> Caused by: java.lang.ClassNotFoundException:  
>>> com.google.gson.reflect.TypeToken
>>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>> 	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>>> 	... 20 more
>>>
>>> Again, not sure why it's not able to load the gson jar file, it's  
>>> definitely in the dependencies folder and is included in the built  
>>> mahout-*.job inside the lib folder.
>>>
>>>
>>>
>>> On Tue Jul 14 13:31:53 UTC 2009, Paul Ingles <paul@oobaloo.co.uk>  
>>> wrote:
>>>> I'm not sure I'm afraid, they were whilst I was building at home.
>>>>
>>>> I've just updated trunk here and the current revision (793894)  
>>>> builds
>>>> successfully. I'm going to switch the cluster over to 0.20.0 and  
>>>> see
>>>> whether I can get the KMeans example to run without the GSon  
>>>> problem I
>>>> was having before.
>>>>
>>>> Thanks again,
>>>> Paul
>>>>
>>>>
>>>> On 14 Jul 2009, at 14:04, Grant Ingersoll wrote:
>>>>
>>>>>
>>>>> On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've been going over the kmeans stuff the last few days to try  
>>>>>> and
>>>>>> understand how it works, and how I might extend it to work with 

>>>>>> the
>>>>>> data I'm looking to process. It's taken me a while to get a basic
>>>>>> understanding of things, and really appreciate having lists like
>>>>>> this around for support.
>>>>>>
>>>>>> I need to be able to label the vectors: each vector holds (for a
>>>>>> document) a set of similarity scores across a number of  
>>>>>> attributes.
>>>>>> I did some searching around payloads (after coming across the  
>>>>>> term
>>>>>> in some comments) but couldn't see how I add a payload to the
>>>>>> Vector. I then stumbled on MAHOUT-65 (https://issues.apache.org/jira/browse/MAHOUT-65
>>>>>> ) that mentions the addition of the setName method to Vector.  
>>>>>> I've
>>>>>> tried building trunk, and although there were a few test failures
>>>>>> for other (seemingly unrelated) examples I continued and  
>>>>>> managed to
>>>>>> get the mahout-examples jar/job files built to give it a whirl.
>>>>>
>>>>> What were the errors?
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>


Mime
View raw message