mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: Error with KMeans example in trunk (793894)
Date Tue, 14 Jul 2009 16:59:57 GMT
r793974 adds another validity test to the isConverged() valid file 
filter. This will skip over any _log files that mysteriously get added 
to the clusters directories. Now, only files beginning with "part" and 
not ending with ".crc" will be processed.



Jeff Eastman wrote:
> Why are log files being written to the clusters directories? That is 
> not happening in my trunk checkout and putting any other files into 
> the clusters directories will break the isConverged() method and 
> probably also the mapper & reducer configure() methods.
>
>
> Grant Ingersoll wrote:
>> Are you running in standalone, pseudo-distributed or fully 
>> distributed mode in Hadoop?
>>
>> It looks like a permission error in Hadoop, but maybe we need to make 
>> sure we have appropriate access.  I'm not that familiar with the 
>> Hadoop permission capabilities.
>>
>> On Jul 14, 2009, at 10:46 AM, Paul Ingles wrote:
>>
>>> I'm definitely scratching my head now, although I think it's most 
>>> likely some kind of dodgy configuration/setup on the cluster I'm 
>>> using- if I run some of the other examples I get class loading 
>>> errors for the example classes!
>>>
>>> I downloaded a fresh and unconfigured release of Hadoop 0.20 and a 
>>> new checkout of Mahout trunk, and it compiled, tested, and ran 
>>> through the kmeans example without trouble.
>>>
>>> If I find out what causes the problem I'll let the list know.
>>>
>>> Thanks,
>>> Paul
>>>
>>> On 14 Jul 2009, at 15:01, Paul Ingles wrote:
>>>
>>>> Hi,
>>>>
>>>> The latest: I've updated to Subversion revision 793894 for trunk, 
>>>> the code compiles and runs all of its tests successfully (mvn 
>>>> install inside the project root/checkout dir).
>>>>
>>>> If I then run the kmeans example:
>>>>
>>>> $ hadoop jar ./examples/target/mahout-examples-0.2-SNAPSHOT.job 
>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>>
>>>> It finishes the Iteration 0 but then errors with the following:
>>>>
>>>> 09/07/14 14:42:16 INFO mapred.JobClient:     Reduce input records=449
>>>> 09/07/14 14:42:16 WARN kmeans.KMeansDriver: java.io.IOException: 
>>>> Cannot open filename /user/pair/output/clusters-0/_logs
>>>> java.io.IOException: Cannot open filename 
>>>> /user/pair/output/clusters-0/_logs
>>>>     at 
>>>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1444)

>>>>
>>>>     at 
>>>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1435)

>>>>
>>>>     at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347)
>>>>     at 
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)

>>>>
>>>>     at 
>>>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)

>>>>
>>>>     at 
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)

>>>>
>>>>     at 
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)

>>>>
>>>>     at 
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)

>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304)

>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241)

>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194)

>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100)

>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56)

>>>>
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at 
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

>>>>
>>>>     at 
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>>>>
>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>>>>
>>>> It then moves onto the Clustering phase and reports the following:
>>>>
>>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
>>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Running Clustering
>>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Input: output/data 
>>>> Clusters In: output/clusters-0 Out: output/points Distance: 
>>>> org.apache.mahout.utils.EuclideanDistanceMeasure
>>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: convergence: 0.5 Input 
>>>> Vectors: org.apache.mahout.matrix.SparseVector
>>>> 09/07/14 14:42:16 WARN mapred.JobClient: Use GenericOptionsParser 
>>>> for parsing the arguments. Applications should implement Tool for 
>>>> the same.
>>>> 09/07/14 14:42:16 INFO mapred.FileInputFormat: Total input paths to 
>>>> process : 271
>>>> 09/07/14 14:42:16 INFO mapred.JobClient: Running job: 
>>>> job_200907141434_0004
>>>> 09/07/14 14:42:17 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 09/07/14 14:42:28 INFO mapred.JobClient: Task Id : 
>>>> attempt_200907141434_0004_m_000000_0, Status : FAILED
>>>> java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
>>>>     at java.lang.ClassLoader.defineClass1(Native Method)
>>>>     at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
>>>>     at 
>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) 
>>>>
>>>>     at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>>>>     at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>>>     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>>>>     at 
>>>> org.apache.mahout.matrix.AbstractVector.asFormatString(AbstractVector.java:374)

>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java:198)

>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39)

>>>>
>>>>     at 
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32)

>>>>
>>>>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>> Caused by: java.lang.ClassNotFoundException: 
>>>> com.google.gson.reflect.TypeToken
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>>>     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
>>>>     ... 20 more
>>>>
>>>> Again, not sure why it's not able to load the gson jar file, it's 
>>>> definitely in the dependencies folder and is included in the built 
>>>> mahout-*.job inside the lib folder.
>>>>
>>>>
>>>>
>>>> On Tue Jul 14 13:31:53 UTC 2009, Paul Ingles <paul@oobaloo.co.uk> 
>>>> wrote:
>>>>> I'm not sure I'm afraid, they were whilst I was building at home.
>>>>>
>>>>> I've just updated trunk here and the current revision (793894) builds
>>>>> successfully. I'm going to switch the cluster over to 0.20.0 and see
>>>>> whether I can get the KMeans example to run without the GSon 
>>>>> problem I
>>>>> was having before.
>>>>>
>>>>> Thanks again,
>>>>> Paul
>>>>>
>>>>>
>>>>> On 14 Jul 2009, at 14:04, Grant Ingersoll wrote:
>>>>>
>>>>>>
>>>>>> On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've been going over the kmeans stuff the last few days to try
and
>>>>>>> understand how it works, and how I might extend it to work with
the
>>>>>>> data I'm looking to process. It's taken me a while to get a basic
>>>>>>> understanding of things, and really appreciate having lists like
>>>>>>> this around for support.
>>>>>>>
>>>>>>> I need to be able to label the vectors: each vector holds (for
a
>>>>>>> document) a set of similarity scores across a number of attributes.
>>>>>>> I did some searching around payloads (after coming across the
term
>>>>>>> in some comments) but couldn't see how I add a payload to the
>>>>>>> Vector. I then stumbled on MAHOUT-65 
>>>>>>> (https://issues.apache.org/jira/browse/MAHOUT-65
>>>>>>> ) that mentions the addition of the setName method to Vector.
I've
>>>>>>> tried building trunk, and although there were a few test failures
>>>>>>> for other (seemingly unrelated) examples I continued and managed
to
>>>>>>> get the mahout-examples jar/job files built to give it a whirl.
>>>>>>
>>>>>> What were the errors?
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) 
>> using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>>
>
>
>


Mime
View raw message