mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Ingles <p...@oobaloo.co.uk>
Subject Re: Error with KMeans example in trunk (793689)
Date Tue, 14 Jul 2009 13:11:05 GMT
I've also tried r787776 on Hadoop 0.19.1, I get a NoClassDefFoundError  
for com/google/gson/reflect/TypeToken. I'm pretty sure this is the  
same error I was seeing when trying 793689 against Hadoop 0.20.0.

I've checked the mahout-*-examples.job file and the lib directory does  
contain gson-1.3.jar which does contain TypeToken.class at com/google/ 
gson/reflect so not too sure what's happening.

On 14 Jul 2009, at 13:23, Paul Ingles wrote:

> I noticed it was using 0.20.0 this morning and gave it a go. I think  
> it failed at the Clustering phases with a NoClassDef error for the  
> GSon stuff, but I don't remember exactly.
>
> I'm running from an earlier revision against 0.19 at the moment, but  
> will try 0.20 again when it's finished and let you know how it goes.
>
> Thanks again,
> Paul
>
> On 14 Jul 2009, at 12:58, Grant Ingersoll wrote:
>
>> Try Hadoop 0.20.0, which is what trunk is now on.  I will update  
>> the docs.
>>
>>
>> On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:
>>
>>> Hi,
>>>
>>> I've been going over the kmeans stuff the last few days to try and  
>>> understand how it works, and how I might extend it to work with  
>>> the data I'm looking to process. It's taken me a while to get a  
>>> basic understanding of things, and really appreciate having lists  
>>> like this around for support.
>>>
>>> I need to be able to label the vectors: each vector holds (for a  
>>> document) a set of similarity scores across a number of  
>>> attributes. I did some searching around payloads (after coming  
>>> across the term in some comments) but couldn't see how I add a  
>>> payload to the Vector. I then stumbled on MAHOUT-65 (https://issues.apache.org/jira/browse/MAHOUT-65

>>> ) that mentions the addition of the setName method to Vector. I've  
>>> tried building trunk, and although there were a few test failures  
>>> for other (seemingly unrelated) examples I continued and managed  
>>> to get the mahout-examples jar/job files built to give it a whirl.
>>>
>>> When I run the following:
>>>
>>> $ hadoop jar examples/target/mahout-examples-0.2-SNAPSHOT.job  
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>
>>> I see it run the "Preparing Input", "Running Canopy to get initial  
>>> clusters", and then finally it starts "Running KMeans". But,  
>>> shortly after it breaks with the following trace:
>>>
>>> ---snip---
>>> Running KMeans
>>> 09/07/13 23:49:34 INFO kmeans.KMeansDriver: Input: output/data  
>>> Clusters In: output/canopies Out: output Distance:  
>>> org.apache.mahout.utils.EuclideanDistanceMeasure
>>> 09/07/13 23:49:34 INFO kmeans.KMeansDriver: convergence: 0.5 max  
>>> Iterations: 10 num Reduce Tasks: 1 Input Vectors:  
>>> org.apache.mahout.matrix.SparseVector
>>> 09/07/13 23:49:34 INFO kmeans.KMeansDriver: Iteration 0
>>> 09/07/13 23:49:34 WARN mapred.JobClient: Use GenericOptionsParser  
>>> for parsing the arguments. Applications should implement Tool for  
>>> the same.
>>> 09/07/13 23:49:34 INFO mapred.FileInputFormat: Total input paths  
>>> to process : 2
>>> 09/07/13 23:49:34 INFO mapred.JobClient: Running job:  
>>> job_200907132019_0040
>>> 09/07/13 23:49:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 09/07/13 23:49:42 INFO mapred.JobClient:  map 50% reduce 0%
>>> 09/07/13 23:49:43 INFO mapred.JobClient:  map 100% reduce 0%
>>> 09/07/13 23:49:49 INFO mapred.JobClient:  map 100% reduce 100%
>>> 09/07/13 23:49:50 INFO mapred.JobClient: Job complete:  
>>> job_200907132019_0040
>>> 09/07/13 23:49:50 INFO mapred.JobClient: Counters: 16
>>> 09/07/13 23:49:50 INFO mapred.JobClient:   File Systems
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     HDFS bytes read=465629
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     HDFS bytes written=5631
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Local bytes read=7806
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Local bytes  
>>> written=15674
>>> 09/07/13 23:49:50 INFO mapred.JobClient:   Job Counters
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Launched map tasks=2
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Data-local map tasks=2
>>> 09/07/13 23:49:50 INFO mapred.JobClient:   Map-Reduce Framework
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Reduce input groups=7
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Combine output  
>>> records=10
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Map input records=600
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Reduce output records=7
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Map output bytes=465600
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Map input bytes=448580
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Combine input  
>>> records=600
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Map output records=600
>>> 09/07/13 23:49:50 INFO mapred.JobClient:     Reduce input records=10
>>> 09/07/13 23:49:50 WARN kmeans.KMeansDriver: java.io.IOException:  
>>> Cannot open filename /user/paul/output/clusters-0/_logs
>>> java.io.IOException: Cannot open filename /user/paul/output/ 
>>> clusters-0/_logs
>>> 	at org.apache.hadoop.hdfs.DFSClient 
>>> $DFSInputStream.openInfo(DFSClient.java:1394)
>>> 	at org.apache.hadoop.hdfs.DFSClient 
>>> $DFSInputStream.<init>(DFSClient.java:1385)
>>> 	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:338)
>>> 	at  
>>> org 
>>> .apache 
>>> .hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java: 
>>> 171)
>>> 	at org.apache.hadoop.io.SequenceFile 
>>> $Reader.openFile(SequenceFile.java:1437)
>>> 	at org.apache.hadoop.io.SequenceFile 
>>> $Reader.<init>(SequenceFile.java:1424)
>>> 	at org.apache.hadoop.io.SequenceFile 
>>> $Reader.<init>(SequenceFile.java:1417)
>>> 	at org.apache.hadoop.io.SequenceFile 
>>> $Reader.<init>(SequenceFile.java:1412)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout 
>>> .clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100)
>>> 	at  
>>> org 
>>> .apache 
>>> .mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56)
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> 	at  
>>> sun 
>>> .reflect 
>>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> 	at  
>>> sun 
>>> .reflect 
>>> .DelegatingMethodAccessorImpl 
>>> .invoke(DelegatingMethodAccessorImpl.java:25)
>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>>> 	at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>> 	at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>> ---snip---
>>>
>>> This is against revision 793689, running on my development Mac Pro  
>>> (pseudo-distributed single node) with Hadoop 0.19.1.
>>>
>>> It's a bit late to be digging through what's going on, but will  
>>> try and take a look tomorrow- really excited about giving kmeans a  
>>> whirl on the document processing I'm playing with. In the  
>>> meantime, I was wondering whether anyone else had seen the same,  
>>> or knew a way to accomplish something similar with the released  
>>> version (or point me to a past good revision perhaps?)
>>>
>>> Thanks again,
>>> Paul
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
>> using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>


Mime
View raw message