mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: Error with KMeans example in trunk (793894)
Date Tue, 14 Jul 2009 16:38:53 GMT
Why are log files being written to the clusters directories? That is not 
happening in my trunk checkout and putting any other files into the 
clusters directories will break the isConverged() method and probably 
also the mapper & reducer configure() methods.


Grant Ingersoll wrote:
> Are you running in standalone, pseudo-distributed or fully distributed 
> mode in Hadoop?
>
> It looks like a permission error in Hadoop, but maybe we need to make 
> sure we have appropriate access.  I'm not that familiar with the 
> Hadoop permission capabilities.
>
> On Jul 14, 2009, at 10:46 AM, Paul Ingles wrote:
>
>> I'm definitely scratching my head now, although I think it's most 
>> likely some kind of dodgy configuration/setup on the cluster I'm 
>> using- if I run some of the other examples I get class loading errors 
>> for the example classes!
>>
>> I downloaded a fresh and unconfigured release of Hadoop 0.20 and a 
>> new checkout of Mahout trunk, and it compiled, tested, and ran 
>> through the kmeans example without trouble.
>>
>> If I find out what causes the problem I'll let the list know.
>>
>> Thanks,
>> Paul
>>
>> On 14 Jul 2009, at 15:01, Paul Ingles wrote:
>>
>>> Hi,
>>>
>>> The latest: I've updated to Subversion revision 793894 for trunk, 
>>> the code compiles and runs all of its tests successfully (mvn 
>>> install inside the project root/checkout dir).
>>>
>>> If I then run the kmeans example:
>>>
>>> $ hadoop jar ./examples/target/mahout-examples-0.2-SNAPSHOT.job 
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>
>>> It finishes the Iteration 0 but then errors with the following:
>>>
>>> 09/07/14 14:42:16 INFO mapred.JobClient:     Reduce input records=449
>>> 09/07/14 14:42:16 WARN kmeans.KMeansDriver: java.io.IOException: 
>>> Cannot open filename /user/pair/output/clusters-0/_logs
>>> java.io.IOException: Cannot open filename 
>>> /user/pair/output/clusters-0/_logs
>>>     at 
>>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1444)

>>>
>>>     at 
>>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1435)

>>>
>>>     at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347)
>>>     at 
>>> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)

>>>
>>>     at 
>>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437) 
>>>
>>>     at 
>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>>>     at 
>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>>>     at 
>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>>>     at 
>>> org.apache.mahout.clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304)

>>>
>>>     at 
>>> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241)

>>>
>>>     at 
>>> org.apache.mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194)

>>>
>>>     at 
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100)

>>>
>>>     at 
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56) 
>>>
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

>>>
>>>     at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>>>
>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>>>
>>> It then moves onto the Clustering phase and reports the following:
>>>
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Running Clustering
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Input: output/data 
>>> Clusters In: output/clusters-0 Out: output/points Distance: 
>>> org.apache.mahout.utils.EuclideanDistanceMeasure
>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: convergence: 0.5 Input 
>>> Vectors: org.apache.mahout.matrix.SparseVector
>>> 09/07/14 14:42:16 WARN mapred.JobClient: Use GenericOptionsParser 
>>> for parsing the arguments. Applications should implement Tool for 
>>> the same.
>>> 09/07/14 14:42:16 INFO mapred.FileInputFormat: Total input paths to 
>>> process : 271
>>> 09/07/14 14:42:16 INFO mapred.JobClient: Running job: 
>>> job_200907141434_0004
>>> 09/07/14 14:42:17 INFO mapred.JobClient:  map 0% reduce 0%
>>> 09/07/14 14:42:28 INFO mapred.JobClient: Task Id : 
>>> attempt_200907141434_0004_m_000000_0, Status : FAILED
>>> java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
>>>     at java.lang.ClassLoader.defineClass1(Native Method)
>>>     at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
>>>     at 
>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
>>>     at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>>>     at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>>     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>>>     at 
>>> org.apache.mahout.matrix.AbstractVector.asFormatString(AbstractVector.java:374)

>>>
>>>     at 
>>> org.apache.mahout.clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java:198)

>>>
>>>     at 
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39)

>>>
>>>     at 
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32)

>>>
>>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> Caused by: java.lang.ClassNotFoundException: 
>>> com.google.gson.reflect.TypeToken
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>>     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>>>     ... 20 more
>>>
>>> Again, not sure why it's not able to load the gson jar file, it's 
>>> definitely in the dependencies folder and is included in the built 
>>> mahout-*.job inside the lib folder.
>>>
>>>
>>>
>>> On Tue Jul 14 13:31:53 UTC 2009, Paul Ingles <paul@oobaloo.co.uk> 
>>> wrote:
>>>> I'm not sure I'm afraid, they were whilst I was building at home.
>>>>
>>>> I've just updated trunk here and the current revision (793894) builds
>>>> successfully. I'm going to switch the cluster over to 0.20.0 and see
>>>> whether I can get the KMeans example to run without the GSon problem I
>>>> was having before.
>>>>
>>>> Thanks again,
>>>> Paul
>>>>
>>>>
>>>> On 14 Jul 2009, at 14:04, Grant Ingersoll wrote:
>>>>
>>>>>
>>>>> On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've been going over the kmeans stuff the last few days to try and
>>>>>> understand how it works, and how I might extend it to work with the
>>>>>> data I'm looking to process. It's taken me a while to get a basic
>>>>>> understanding of things, and really appreciate having lists like
>>>>>> this around for support.
>>>>>>
>>>>>> I need to be able to label the vectors: each vector holds (for a
>>>>>> document) a set of similarity scores across a number of attributes.
>>>>>> I did some searching around payloads (after coming across the term
>>>>>> in some comments) but couldn't see how I add a payload to the
>>>>>> Vector. I then stumbled on MAHOUT-65 
>>>>>> (https://issues.apache.org/jira/browse/MAHOUT-65
>>>>>> ) that mentions the addition of the setName method to Vector. I've
>>>>>> tried building trunk, and although there were a few test failures
>>>>>> for other (seemingly unrelated) examples I continued and managed
to
>>>>>> get the mahout-examples jar/job files built to give it a whirl.
>>>>>
>>>>> What were the errors?
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) 
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
>


Mime
View raw message