Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 78890 invoked from network); 14 Jul 2009 17:00:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Jul 2009 17:00:26 -0000 Received: (qmail 71959 invoked by uid 500); 14 Jul 2009 17:00:35 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 71915 invoked by uid 500); 14 Jul 2009 17:00:35 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 71905 invoked by uid 99); 14 Jul 2009 17:00:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jul 2009 17:00:35 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_HELO_PASS,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.208.4.195] (HELO mout.perfora.net) (74.208.4.195) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jul 2009 17:00:25 +0000 Received: from jeff-eastmans-macbook-pro.local (c-71-198-3-140.hsd1.ca.comcast.net [71.198.3.140]) by mrelay.perfora.net (node=mrus0) with ESMTP (Nemesis) id 0MKp8S-1MQlMG1bAt-0000m9; Tue, 14 Jul 2009 13:00:01 -0400 Received: from jeff-eastmans-macbook-pro.local by jeff-eastmans-macbook-pro.local (PGP Universal service); Tue, 14 Jul 2009 10:00:01 -0700 X-PGP-Universal: processed; by jeff-eastmans-macbook-pro.local on Tue, 14 Jul 2009 10:00:01 -0700 Message-ID: <4A5CB98D.5010304@windwardsolutions.com> Date: Tue, 14 Jul 2009 09:59:57 -0700 From: Jeff Eastman User-Agent: Thunderbird 2.0.0.22 (Macintosh/20090605) MIME-Version: 1.0 To: mahout-user@lucene.apache.org Subject: Re: Error with KMeans example in trunk (793894) References: <4a5c8fcab13a0_b2b66ad4ce9a2@web3-prod.tmail> <4a5c8fcab13a0_b2b66ad4ce9a2@web3-prod.tmail> <67C4C012-6887-4B5B-9C2E-09F1DD3002EC@apache.org> <4A5CB49D.1090109@windwardsolutions.com> In-Reply-To: <4A5CB49D.1090109@windwardsolutions.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V01U2FsdGVkX1+sV1tDEWw6UacGX2Ss/dCbc0J7vLaAOJ+4bT4 x0dl+n7BRXo2CGfsrrJeF/9EYYJAOAHusa+vd0lzJJAg0sIssc UElBjEyffkkDjGjaTK3BAN5s3oVIuL4p5wfTp1dY+s= X-Virus-Checked: Checked by ClamAV on apache.org r793974 adds another validity test to the isConverged() valid file filter. This will skip over any _log files that mysteriously get added to the clusters directories. Now, only files beginning with "part" and not ending with ".crc" will be processed. Jeff Eastman wrote: > Why are log files being written to the clusters directories? That is > not happening in my trunk checkout and putting any other files into > the clusters directories will break the isConverged() method and > probably also the mapper & reducer configure() methods. > > > Grant Ingersoll wrote: >> Are you running in standalone, pseudo-distributed or fully >> distributed mode in Hadoop? >> >> It looks like a permission error in Hadoop, but maybe we need to make >> sure we have appropriate access. I'm not that familiar with the >> Hadoop permission capabilities. >> >> On Jul 14, 2009, at 10:46 AM, Paul Ingles wrote: >> >>> I'm definitely scratching my head now, although I think it's most >>> likely some kind of dodgy configuration/setup on the cluster I'm >>> using- if I run some of the other examples I get class loading >>> errors for the example classes! >>> >>> I downloaded a fresh and unconfigured release of Hadoop 0.20 and a >>> new checkout of Mahout trunk, and it compiled, tested, and ran >>> through the kmeans example without trouble. >>> >>> If I find out what causes the problem I'll let the list know. >>> >>> Thanks, >>> Paul >>> >>> On 14 Jul 2009, at 15:01, Paul Ingles wrote: >>> >>>> Hi, >>>> >>>> The latest: I've updated to Subversion revision 793894 for trunk, >>>> the code compiles and runs all of its tests successfully (mvn >>>> install inside the project root/checkout dir). >>>> >>>> If I then run the kmeans example: >>>> >>>> $ hadoop jar ./examples/target/mahout-examples-0.2-SNAPSHOT.job >>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job >>>> >>>> It finishes the Iteration 0 but then errors with the following: >>>> >>>> 09/07/14 14:42:16 INFO mapred.JobClient: Reduce input records=449 >>>> 09/07/14 14:42:16 WARN kmeans.KMeansDriver: java.io.IOException: >>>> Cannot open filename /user/pair/output/clusters-0/_logs >>>> java.io.IOException: Cannot open filename >>>> /user/pair/output/clusters-0/_logs >>>> at >>>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1444) >>>> >>>> at >>>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.(DFSClient.java:1435) >>>> >>>> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347) >>>> at >>>> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178) >>>> >>>> at >>>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437) >>>> >>>> at >>>> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1424) >>>> >>>> at >>>> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417) >>>> >>>> at >>>> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412) >>>> >>>> at >>>> org.apache.mahout.clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304) >>>> >>>> at >>>> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241) >>>> >>>> at >>>> org.apache.mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194) >>>> >>>> at >>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100) >>>> >>>> at >>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56) >>>> >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>> >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>> >>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering >>>> >>>> It then moves onto the Clustering phase and reports the following: >>>> >>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering >>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Running Clustering >>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: Input: output/data >>>> Clusters In: output/clusters-0 Out: output/points Distance: >>>> org.apache.mahout.utils.EuclideanDistanceMeasure >>>> 09/07/14 14:42:16 INFO kmeans.KMeansDriver: convergence: 0.5 Input >>>> Vectors: org.apache.mahout.matrix.SparseVector >>>> 09/07/14 14:42:16 WARN mapred.JobClient: Use GenericOptionsParser >>>> for parsing the arguments. Applications should implement Tool for >>>> the same. >>>> 09/07/14 14:42:16 INFO mapred.FileInputFormat: Total input paths to >>>> process : 271 >>>> 09/07/14 14:42:16 INFO mapred.JobClient: Running job: >>>> job_200907141434_0004 >>>> 09/07/14 14:42:17 INFO mapred.JobClient: map 0% reduce 0% >>>> 09/07/14 14:42:28 INFO mapred.JobClient: Task Id : >>>> attempt_200907141434_0004_m_000000_0, Status : FAILED >>>> java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken >>>> at java.lang.ClassLoader.defineClass1(Native Method) >>>> at java.lang.ClassLoader.defineClass(ClassLoader.java:675) >>>> at >>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) >>>> >>>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:260) >>>> at java.net.URLClassLoader.access$000(URLClassLoader.java:56) >>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:195) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:316) >>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288) >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:251) >>>> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374) >>>> at >>>> org.apache.mahout.matrix.AbstractVector.asFormatString(AbstractVector.java:374) >>>> >>>> at >>>> org.apache.mahout.clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java:198) >>>> >>>> at >>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39) >>>> >>>> at >>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32) >>>> >>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) >>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >>>> at org.apache.hadoop.mapred.Child.main(Child.java:170) >>>> Caused by: java.lang.ClassNotFoundException: >>>> com.google.gson.reflect.TypeToken >>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:200) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:316) >>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288) >>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:251) >>>> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374) >>>> ... 20 more >>>> >>>> Again, not sure why it's not able to load the gson jar file, it's >>>> definitely in the dependencies folder and is included in the built >>>> mahout-*.job inside the lib folder. >>>> >>>> >>>> >>>> On Tue Jul 14 13:31:53 UTC 2009, Paul Ingles >>>> wrote: >>>>> I'm not sure I'm afraid, they were whilst I was building at home. >>>>> >>>>> I've just updated trunk here and the current revision (793894) builds >>>>> successfully. I'm going to switch the cluster over to 0.20.0 and see >>>>> whether I can get the KMeans example to run without the GSon >>>>> problem I >>>>> was having before. >>>>> >>>>> Thanks again, >>>>> Paul >>>>> >>>>> >>>>> On 14 Jul 2009, at 14:04, Grant Ingersoll wrote: >>>>> >>>>>> >>>>>> On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I've been going over the kmeans stuff the last few days to try and >>>>>>> understand how it works, and how I might extend it to work with the >>>>>>> data I'm looking to process. It's taken me a while to get a basic >>>>>>> understanding of things, and really appreciate having lists like >>>>>>> this around for support. >>>>>>> >>>>>>> I need to be able to label the vectors: each vector holds (for a >>>>>>> document) a set of similarity scores across a number of attributes. >>>>>>> I did some searching around payloads (after coming across the term >>>>>>> in some comments) but couldn't see how I add a payload to the >>>>>>> Vector. I then stumbled on MAHOUT-65 >>>>>>> (https://issues.apache.org/jira/browse/MAHOUT-65 >>>>>>> ) that mentions the addition of the setName method to Vector. I've >>>>>>> tried building trunk, and although there were a few test failures >>>>>>> for other (seemingly unrelated) examples I continued and managed to >>>>>>> get the mahout-examples jar/job files built to give it a whirl. >>>>>> >>>>>> What were the errors? >>> >> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >> using Solr/Lucene: >> http://www.lucidimagination.com/search >> >> >> > > >