Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@locus.apache.org Received: (qmail 81198 invoked from network); 28 Oct 2008 01:31:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Oct 2008 01:31:58 -0000 Received: (qmail 26233 invoked by uid 500); 28 Oct 2008 01:32:02 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 26213 invoked by uid 500); 28 Oct 2008 01:32:02 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 26202 invoked by uid 99); 28 Oct 2008 01:32:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Oct 2008 18:32:02 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [208.97.132.66] (HELO spunkymail-a6.g.dreamhost.com) (208.97.132.66) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Oct 2008 01:30:46 +0000 Received: from [192.168.0.3] (adsl-074-229-189-244.sip.rmo.bellsouth.net [74.229.189.244]) by spunkymail-a6.g.dreamhost.com (Postfix) with ESMTP id 2FF65109F28 for ; Mon, 27 Oct 2008 18:31:24 -0700 (PDT) Message-Id: From: Grant Ingersoll To: mahout-user@lucene.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v929.2) Subject: Re: Problems with KMeans clustering Date: Mon, 27 Oct 2008 21:31:23 -0400 References: <7da7efbf0810251623y77f4fc55i8b2771edf46b7291@mail.gmail.com> <7da7efbf0810260746h2562936bx29e971a2fa08880f@mail.gmail.com> <5C10F359-277F-4C40-8DAC-148E88E08909@apache.org> <7da7efbf0810271326w3af8a3bdlcef452889e4b3684@mail.gmail.com> <14097E34-84C9-4AFB-B41E-21D1B45D879F@apache.org> <7da7efbf0810271714y5802e8cbya97e9c0522b0223d@mail.gmail.com> X-Mailer: Apple Mail (2.929.2) X-Virus-Checked: Checked by ClamAV on apache.org That is, I can reproduce the original problem. On Oct 27, 2008, at 9:22 PM, Grant Ingersoll wrote: > OK, I can reproduce this. > > On Oct 27, 2008, at 8:14 PM, Philippe Lamarche wrote: > >> I removed the apache-mahout-core-0.1-dev.jar file from {hadoop- >> home}/lib and >> added apache-mahout-examples-0.1-dev.job >> >> my lib folder now contain : >> -rw-r--r-- 1 hadoop hadoop 4506592 2008-10-27 19:59 >> apache-mahout-examples-0.1-dev.job >> -rw-r--r-- 1 hadoop root 258337 2008-10-27 14:37 >> commons-cli-2.0-SNAPSHOT.jar >> -rw-r--r-- 1 hadoop root 46725 2008-10-27 14:37 commons- >> codec-1.3.jar >> -rw-r--r-- 1 hadoop root 279781 2008-10-27 14:37 >> commons-httpclient-3.0.1.jar >> -rw-r--r-- 1 hadoop root 38015 2008-10-27 14:37 >> commons-logging-1.0.4.jar >> -rw-r--r-- 1 hadoop root 26202 2008-10-27 14:37 >> commons-logging-api-1.0.4.jar >> -rw-r--r-- 1 hadoop root 180792 2008-10-27 14:37 commons- >> net-1.4.1.jar >> -rw-r--r-- 1 hadoop root 288534 2008-10-27 14:37 jets3t-0.6.0.jar >> -rw-r--r-- 1 hadoop root 665638 2008-10-27 14:37 jetty-5.1.4.jar >> -rw-r--r-- 1 hadoop root 11358 2008-10-27 14:37 >> jetty-5.1.4.LICENSE.txt >> drwxr-xr-x 2 hadoop root 4096 2008-10-27 14:37 jetty-ext >> -rw-r--r-- 1 hadoop root 121070 2008-10-27 14:37 junit-3.8.1.jar >> -rw-r--r-- 1 hadoop root 14999 2008-10-27 14:37 >> junit-3.8.1.LICENSE.txt >> -rw-r--r-- 1 hadoop root 9484 2008-10-27 14:37 kfs-0.1.3.jar >> -rw-r--r-- 1 hadoop root 11358 2008-10-27 14:37 >> kfs-0.1.LICENSE.txt >> -rw-r--r-- 1 hadoop root 391834 2008-10-27 14:37 log4j-1.2.15.jar >> drwxr-xr-x 4 hadoop root 4096 2008-10-27 14:37 native >> -rw-r--r-- 1 hadoop root 65261 2008-10-27 14:37 oro-2.0.8.jar >> -rw-r--r-- 1 hadoop root 97689 2008-10-27 14:37 servlet-api.jar >> -rw-r--r-- 1 hadoop root 15345 2008-10-27 14:37 slf4j- >> api-1.4.3.jar >> -rw-r--r-- 1 hadoop root 1159 2008-10-27 14:37 slf4j-LICENSE.txt >> -rw-r--r-- 1 hadoop root 8601 2008-10-27 14:37 slf4j- >> log4j12-1.4.3.jar >> -rw-r--r-- 1 hadoop root 15010 2008-10-27 14:37 xmlenc-0.52.jar >> >> when I try to run the synthetic example I get: >> >> hadoop@phil:/usr/local/hadoop$ bin/hadoop jar >> /home/philippe/workspace/MahoutJava/examples/dist/apache-mahout- >> examples-0.1-dev.jar >> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job >> java.lang.NoClassDefFoundError: org/apache/mahout/matrix/Vector >> at >> org >> .apache >> .mahout >> .clustering >> .syntheticcontrol.canopy.InputDriver.runJob(InputDriver.java:42) >> at >> org >> .apache >> .mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:77) >> at >> org >> .apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java: >> 44) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun >> .reflect >> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun >> .reflect >> .DelegatingMethodAccessorImpl >> .invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:155) >> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) >> Caused by: java.lang.ClassNotFoundException: >> org.apache.mahout.matrix.Vector >> at java.net.URLClassLoader$1.run(URLClassLoader.java:200) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:251) >> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) >> ... 12 more >> >> Right now, Hadoop doesn't have any additional classpath elements >> that I know >> of, from conf/hadoop-env.sh or elsewhere. >> >> Did I understand correctly what you were saying? >> >> On Mon, Oct 27, 2008 at 7:29 PM, Grant Ingersoll >> wrote: >> >>> >>> On Oct 27, 2008, at 4:26 PM, Philippe Lamarche wrote: >>> >>> Hi, >>>> >>>> My goal is to run the example KMeans. I must download the synthetic >>>> control >>>> data and put it on the dfs in "testdata". >>>> >>>> To be sure that everything is ok, I stated form a clean state on my >>>> laptop. >>>> >>>> I downloaded hadoop 0.18.1. >>>> >>>> I changed the conf/hadoop-site.xml to this: >>>> >>>> >>>> >>>> >>>> >>>> >>>> hadoop.tmp.dir >>>> /usr/local/hadoop-datastore/hadoop-${user.name} >>>> >>>> >>>> fs.default.name >>>> hdfs://localhost:9000 >>>> >>>> >>>> mapred.job.tracker >>>> hdfs://localhost:9001 >>>> >>>> >>>> dfs.replication >>>> 1 >>>> >>>> >>>> >>>> I changed JAVA_HOME in hadoop-env.sh. >>>> >>>> I downloaded mahout from SVN, at revision 708282. >>>> >>>> I built both core and example from ant script. >>>> >>>> I copied apache-mahout-core-0.1-dev.jar to {hadoop-home}/lib. >>>> >>> >>> What happens if you don't do this but use the "job" file instead >>> (ant job >>> in the examples dir)? I'm trying to replicate this, but am stuck >>> at the >>> moment. >>> >>> >>> >>>> >>>> I downloaded >>>> >>>> http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data >>>> >>>> I added the file to the dfs: >>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop dfs -put >>>> /home/philippe/synthetic_control.data testdata >>>> >>>> I ran the example jar, but it failed : >>>> >>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop jar >>>> >>>> /home/philippe/workspace/MahoutJava/examples/dist/apache-mahout- >>>> examples-0.1-dev.jar >>>> >>> >>> >>> >>> >>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job >>>> 08/10/27 15:34:55 WARN mapred.JobClient: Use GenericOptionsParser >>>> for >>>> parsing the arguments. Applications should implement Tool for the >>>> same. >>>> 08/10/27 15:34:55 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 1 >>>> 08/10/27 15:34:55 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 1 >>>> 08/10/27 15:34:55 INFO mapred.JobClient: Running job: >>>> job_200810271532_0001 >>>> 08/10/27 15:34:56 INFO mapred.JobClient: map 0% reduce 0% >>>> 08/10/27 15:34:59 INFO mapred.JobClient: Job complete: >>>> job_200810271532_0001 >>>> 08/10/27 15:34:59 INFO mapred.JobClient: Counters: 7 >>>> 08/10/27 15:34:59 INFO mapred.JobClient: File Systems >>>> 08/10/27 15:34:59 INFO mapred.JobClient: HDFS bytes read=291644 >>>> 08/10/27 15:34:59 INFO mapred.JobClient: HDFS bytes >>>> written=323660 >>>> 08/10/27 15:34:59 INFO mapred.JobClient: Job Counters >>>> 08/10/27 15:34:59 INFO mapred.JobClient: Launched map tasks=2 >>>> 08/10/27 15:34:59 INFO mapred.JobClient: Data-local map tasks=2 >>>> 08/10/27 15:34:59 INFO mapred.JobClient: Map-Reduce Framework >>>> 08/10/27 15:34:59 INFO mapred.JobClient: Map input records=600 >>>> 08/10/27 15:34:59 INFO mapred.JobClient: Map input bytes=288374 >>>> 08/10/27 15:34:59 INFO mapred.JobClient: Map output records=600 >>>> 08/10/27 15:34:59 WARN mapred.JobClient: Use GenericOptionsParser >>>> for >>>> parsing the arguments. Applications should implement Tool for the >>>> same. >>>> 08/10/27 15:35:00 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 2 >>>> 08/10/27 15:35:00 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 2 >>>> 08/10/27 15:35:00 INFO mapred.JobClient: Running job: >>>> job_200810271532_0002 >>>> 08/10/27 15:35:01 INFO mapred.JobClient: map 0% reduce 0% >>>> 08/10/27 15:35:10 INFO mapred.JobClient: map 100% reduce 0% >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Job complete: >>>> job_200810271532_0002 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Counters: 16 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: File Systems >>>> 08/10/27 15:35:16 INFO mapred.JobClient: HDFS bytes read=323660 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: HDFS bytes >>>> written=1447 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Local bytes read=1389 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Local bytes >>>> written=37878 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Job Counters >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Launched reduce >>>> tasks=1 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Launched map tasks=2 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Data-local map tasks=2 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Map-Reduce Framework >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Reduce input groups=1 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Combine output >>>> records=29 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Map input records=600 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Reduce output >>>> records=1 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Map output >>>> bytes=943020 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Map input bytes=323660 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Combine input >>>> records=1760 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Map output >>>> records=1732 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Reduce input records=1 >>>> 08/10/27 15:35:16 WARN mapred.JobClient: Use GenericOptionsParser >>>> for >>>> parsing the arguments. Applications should implement Tool for the >>>> same. >>>> 08/10/27 15:35:16 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 2 >>>> 08/10/27 15:35:16 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 2 >>>> 08/10/27 15:35:16 INFO mapred.JobClient: Running job: >>>> job_200810271532_0003 >>>> 08/10/27 15:35:17 INFO mapred.JobClient: map 0% reduce 0% >>>> 08/10/27 15:35:24 INFO mapred.JobClient: map 100% reduce 0% >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Job complete: >>>> job_200810271532_0003 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Counters: 16 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: File Systems >>>> 08/10/27 15:35:28 INFO mapred.JobClient: HDFS bytes read=326554 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: HDFS bytes >>>> written=1137260 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Local bytes >>>> read=1147358 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Local bytes >>>> written=2304490 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Job Counters >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Launched reduce >>>> tasks=1 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Launched map tasks=2 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Data-local map tasks=2 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Map-Reduce Framework >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Reduce input groups=1 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Combine output >>>> records=0 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Map input records=600 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Reduce output >>>> records=600 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Map output >>>> bytes=1139660 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Map input bytes=323660 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Combine input >>>> records=0 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Map output records=600 >>>> 08/10/27 15:35:28 INFO mapred.JobClient: Reduce input >>>> records=600 >>>> 08/10/27 15:35:28 INFO kmeans.KMeansDriver: Iteration 0 >>>> 08/10/27 15:35:29 WARN mapred.JobClient: Use GenericOptionsParser >>>> for >>>> parsing the arguments. Applications should implement Tool for the >>>> same. >>>> 08/10/27 15:35:29 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 2 >>>> 08/10/27 15:35:29 INFO mapred.FileInputFormat: Total input paths to >>>> process >>>> : 2 >>>> 08/10/27 15:35:29 INFO mapred.JobClient: Running job: >>>> job_200810271532_0004 >>>> 08/10/27 15:35:30 INFO mapred.JobClient: map 0% reduce 0% >>>> 08/10/27 15:35:37 INFO mapred.JobClient: map 100% reduce 0% >>>> 08/10/27 15:35:45 INFO mapred.JobClient: Task Id : >>>> attempt_200810271532_0004_r_000000_0, Status : FAILED >>>> java.io.IOException: attempt_200810271532_0004_r_000000_0The >>>> reduce copier >>>> failed >>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255) >>>> at >>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: >>>> 2207) >>>> >>>> >>>> The failed attempts logs contain this: >>>> >>>> 008-10-27 15:35:40,133 INFO org.apache.hadoop.mapred.ReduceTask: >>>> Shuffling 2524 bytes (2524 raw bytes) into RAM from >>>> attempt_200810271532_0004_m_000000_0 >>>> 2008-10-27 15:35:40,134 INFO org.apache.hadoop.mapred.ReduceTask: >>>> Read >>>> 2524 bytes from map-output for attempt_200810271532_0004_m_000000_0 >>>> 2008-10-27 15:35:40,134 INFO org.apache.hadoop.mapred.ReduceTask: >>>> Rec >>>> #1 from attempt_200810271532_0004_m_000000_0 -> (1358, 1158) from >>>> phil >>>> 2008-10-27 15:35:41,110 INFO org.apache.hadoop.mapred.ReduceTask: >>>> Closed ram manager >>>> 2008-10-27 15:35:41,125 INFO org.apache.hadoop.mapred.ReduceTask: >>>> Interleaved on-disk merge complete: 0 files left. >>>> 2008-10-27 15:35:41,173 INFO org.apache.hadoop.mapred.ReduceTask: >>>> Initiating in-memory merge with 2 segments... >>>> 2008-10-27 15:35:41,177 INFO org.apache.hadoop.mapred.Merger: >>>> Merging >>>> 2 sorted segments >>>> 2008-10-27 15:35:41,178 INFO org.apache.hadoop.mapred.Merger: >>>> Down to >>>> the last merge-pass, with 2 segments left of total size: 5011 bytes >>>> 2008-10-27 15:35:41,197 WARN org.apache.hadoop.mapred.ReduceTask: >>>> attempt_200810271532_0004_r_000000_0 Merge of the inmemory files >>>> threw >>>> an exception: java.io.IOException: Intermedate merge failed >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >>>> $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >>>> $InMemFSMergeThread.run(ReduceTask.java:2078) >>>> Caused by: java.lang.NumberFormatException: For input string: "[" >>>> at >>>> sun >>>> .misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java: >>>> 1224) >>>> at java.lang.Double.parseDouble(Double.java:510) >>>> at >>>> org >>>> .apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60) >>>> at >>>> org >>>> .apache >>>> .mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256) >>>> at >>>> org >>>> .apache >>>> .mahout >>>> .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38) >>>> at >>>> org >>>> .apache >>>> .mahout >>>> .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask >>>> $ReduceCopier.combineAndSpill(ReduceTask.java:2174) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access >>>> $3100(ReduceTask.java:341) >>>> at >>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >>>> $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134) >>>> ... 1 more >>>> >>>> 2008-10-27 15:35:41,197 INFO org.apache.hadoop.mapred.ReduceTask: >>>> In-memory merge complete: 0 files left. >>>> 2008-10-27 15:35:41,198 WARN org.apache.hadoop.mapred.TaskTracker: >>>> Error running child >>>> java.io.IOException: attempt_200810271532_0004_r_000000_0The reduce >>>> copier failed >>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java: >>>> 255) >>>> at >>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: >>>> 2207) >>>> >>>> >>>> >>>> However, I can run the org.apache.mahout.clustering.kmeans unit >>>> tests >>>> without problems. >>>> >>>> I truly do not understand where the problems lies. >>>> Thanks for the help. >>>> >>>> >>>> On Sun, Oct 26, 2008 at 8:24 PM, Grant Ingersoll >>>> wrote: >>>> >>>> Same Mahout code, though, right? >>>>> >>>>> Can you provide details on how you were running it? >>>>> >>>>> >>>>> On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote: >>>>> >>>>> Unfortunately, I went straight from 0.17.2 to 0.18.1. It was >>>>> working on >>>>> >>>>>> 0.17.2. >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll >>>>> >>>>>>> wrote: >>>>>>> >>>>>> >>>>>> Did this work with 0.18.0 or other prior versions for you? >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>>> I just updated to hadoop 0.18.1 and got a clean version of >>>>>>>> mahout from >>>>>>>> svn. >>>>>>>> However, I am having problems with KMeans, that can be traced >>>>>>>> down to >>>>>>>> : >>>>>>>> >>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger: >>>>>>>> Merging >>>>>>>> 2 sorted segments >>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger: >>>>>>>> Down to >>>>>>>> the last merge-pass, with 2 segments left of total size: 5011 >>>>>>>> bytes >>>>>>>> 2008-10-25 19:10:16,999 WARN >>>>>>>> org.apache.hadoop.mapred.ReduceTask: >>>>>>>> attempt_200810251826_0013_r_000000_0 Merge of the inmemory >>>>>>>> files threw >>>>>>>> an exception: java.io.IOException: Intermedate merge failed >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >>>>>>>> $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >>>>>>>> $InMemFSMergeThread.run(ReduceTask.java:2078) >>>>>>>> Caused by: java.lang.NumberFormatException: For input string: >>>>>>>> "[" >>>>>>>> at >>>>>>>> >>>>>>>> sun >>>>>>>> .misc >>>>>>>> .FloatingDecimal.readJavaFormatString(FloatingDecimal.java: >>>>>>>> 1224) >>>>>>>> at java.lang.Double.parseDouble(Double.java:510) >>>>>>>> at >>>>>>>> org >>>>>>>> .apache >>>>>>>> .mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> org >>>>>>>> .apache >>>>>>>> .mahout >>>>>>>> .matrix.AbstractVector.decodeVector(AbstractVector.java:256) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> org >>>>>>>> .apache >>>>>>>> .mahout >>>>>>>> .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java: >>>>>>>> 38) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> org >>>>>>>> .apache >>>>>>>> .mahout >>>>>>>> .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java: >>>>>>>> 31) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> org.apache.hadoop.mapred.ReduceTask >>>>>>>> $ReduceCopier.combineAndSpill(ReduceTask.java:2174) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access >>>>>>>> $3100(ReduceTask.java:341) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier >>>>>>>> $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134) >>>>>>>> ... 1 more >>>>>>>> >>>>>>>> 2008-10-25 19:10:16,999 INFO >>>>>>>> org.apache.hadoop.mapred.ReduceTask: >>>>>>>> In-memory merge complete: 0 files left. >>>>>>>> 2008-10-25 19:10:17,000 WARN >>>>>>>> org.apache.hadoop.mapred.TaskTracker: >>>>>>>> Error running child >>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The >>>>>>>> reduce >>>>>>>> copier failed >>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java: >>>>>>>> 255) >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.TaskTracker >>>>>>>> $Child.main(TaskTracker.java:2207) >>>>>>>> >>>>>>>> >>>>>>>> This is while running the synthetic_control.data example, but >>>>>>>> I have >>>>>>>> the >>>>>>>> same problems with any other input data. >>>>>>>> >>>>>>>> I am able to do other map-reduce job without problems. >>>>>>>> >>>>>>>> Here is the output of the jar task: >>>>>>>> >>>>>>>> hadoop@philippe-vaio:/usr/local/hadoop$ bin/hadoop jar >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> /home/philippe/workspace/MahoutJava/examples/dist/apache- >>>>>>>> mahout-examples-0.1-dev.jar >>>>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job >>>>>>>> 08/10/25 19:09:27 WARN mapred.JobClient: Use >>>>>>>> GenericOptionsParser for >>>>>>>> parsing the arguments. Applications should implement Tool for >>>>>>>> the >>>>>>>> same. >>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input >>>>>>>> paths to >>>>>>>> process >>>>>>>> : 1 >>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input >>>>>>>> paths to >>>>>>>> process >>>>>>>> : 1 >>>>>>>> 08/10/25 19:09:28 INFO mapred.JobClient: Running job: >>>>>>>> job_200810251826_0010 >>>>>>>> 08/10/25 19:09:29 INFO mapred.JobClient: map 0% reduce 0% >>>>>>>> 08/10/25 19:09:31 INFO mapred.JobClient: map 50% reduce 0% >>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job complete: >>>>>>>> job_200810251826_0010 >>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7 >>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: File Systems >>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: HDFS bytes >>>>>>>> read=291644 >>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: HDFS bytes >>>>>>>> written=323660 >>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job Counters >>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Launched map >>>>>>>> tasks=2 >>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Data-local map >>>>>>>> tasks=2 >>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Map-Reduce Framework >>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Map input >>>>>>>> records=600 >>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Map input >>>>>>>> bytes=288374 >>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Map output >>>>>>>> records=600 >>>>>>>> 08/10/25 19:09:32 WARN mapred.JobClient: Use >>>>>>>> GenericOptionsParser for >>>>>>>> parsing the arguments. Applications should implement Tool for >>>>>>>> the >>>>>>>> same. >>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input >>>>>>>> paths to >>>>>>>> process >>>>>>>> : 2 >>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input >>>>>>>> paths to >>>>>>>> process >>>>>>>> : 2 >>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Running job: >>>>>>>> job_200810251826_0011 >>>>>>>> 08/10/25 19:09:33 INFO mapred.JobClient: map 0% reduce 0% >>>>>>>> 08/10/25 19:09:37 INFO mapred.JobClient: map 50% reduce 0% >>>>>>>> 08/10/25 19:09:39 INFO mapred.JobClient: map 100% reduce 0% >>>>>>>> 08/10/25 19:09:44 INFO mapred.JobClient: map 100% reduce 16% >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job complete: >>>>>>>> job_200810251826_0011 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: File Systems >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: HDFS bytes >>>>>>>> read=323660 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: HDFS bytes >>>>>>>> written=1447 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Local bytes >>>>>>>> read=1389 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Local bytes >>>>>>>> written=37878 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job Counters >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Launched reduce >>>>>>>> tasks=1 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Launched map >>>>>>>> tasks=2 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Data-local map >>>>>>>> tasks=2 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map-Reduce Framework >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Reduce input >>>>>>>> groups=1 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Combine output >>>>>>>> records=29 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map input >>>>>>>> records=600 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Reduce output >>>>>>>> records=1 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map output >>>>>>>> bytes=943020 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map input >>>>>>>> bytes=323660 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Combine input >>>>>>>> records=1760 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Map output >>>>>>>> records=1732 >>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Reduce input >>>>>>>> records=1 >>>>>>>> 08/10/25 19:09:53 WARN mapred.JobClient: Use >>>>>>>> GenericOptionsParser for >>>>>>>> parsing the arguments. Applications should implement Tool for >>>>>>>> the >>>>>>>> same. >>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input >>>>>>>> paths to >>>>>>>> process >>>>>>>> : 2 >>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input >>>>>>>> paths to >>>>>>>> process >>>>>>>> : 2 >>>>>>>> 08/10/25 19:09:53 INFO mapred.JobClient: Running job: >>>>>>>> job_200810251826_0012 >>>>>>>> 08/10/25 19:09:54 INFO mapred.JobClient: map 0% reduce 0% >>>>>>>> 08/10/25 19:09:56 INFO mapred.JobClient: map 50% reduce 0% >>>>>>>> 08/10/25 19:09:58 INFO mapred.JobClient: map 100% reduce 0% >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job complete: >>>>>>>> job_200810251826_0012 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: File Systems >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: HDFS bytes >>>>>>>> read=326554 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: HDFS bytes >>>>>>>> written=1137260 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Local bytes >>>>>>>> read=1147358 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Local bytes >>>>>>>> written=2304490 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job Counters >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Launched reduce >>>>>>>> tasks=1 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Launched map >>>>>>>> tasks=2 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Data-local map >>>>>>>> tasks=2 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map-Reduce Framework >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Reduce input >>>>>>>> groups=1 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Combine output >>>>>>>> records=0 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map input >>>>>>>> records=600 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Reduce output >>>>>>>> records=600 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map output >>>>>>>> bytes=1139660 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map input >>>>>>>> bytes=323660 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Combine input >>>>>>>> records=0 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Map output >>>>>>>> records=600 >>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Reduce input >>>>>>>> records=600 >>>>>>>> 08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0 >>>>>>>> 08/10/25 19:10:02 WARN mapred.JobClient: Use >>>>>>>> GenericOptionsParser for >>>>>>>> parsing the arguments. Applications should implement Tool for >>>>>>>> the >>>>>>>> same. >>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input >>>>>>>> paths to >>>>>>>> process >>>>>>>> : 2 >>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input >>>>>>>> paths to >>>>>>>> process >>>>>>>> : 2 >>>>>>>> 08/10/25 19:10:03 INFO mapred.JobClient: Running job: >>>>>>>> job_200810251826_0013 >>>>>>>> 08/10/25 19:10:04 INFO mapred.JobClient: map 0% reduce 0% >>>>>>>> 08/10/25 19:10:08 INFO mapred.JobClient: map 50% reduce 0% >>>>>>>> 08/10/25 19:10:09 INFO mapred.JobClient: map 100% reduce 0% >>>>>>>> 08/10/25 19:10:21 INFO mapred.JobClient: Task Id : >>>>>>>> attempt_200810251826_0013_r_000000_0, Status : FAILED >>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The >>>>>>>> reduce >>>>>>>> copier >>>>>>>> failed >>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255) >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.TaskTracker >>>>>>>> $Child.main(TaskTracker.java:2207) >>>>>>>> >>>>>>>> >>>>>>>> I am not sure if I am doing something wrong here. >>>>>>>> >>>>>>>> Thanks for the help, >>>>>>>> >>>>>>>> Philippe. >>>>>>>> >>>>>>>> >>>>>>>> -------------------------- >>>>>>> Grant Ingersoll >>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New >>>>>>> Orleans. >>>>>>> http://www.lucenebootcamp.com >>>>>>> >>>>>>> >>>>>>> Lucene Helpful Hints: >>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -------------------------- >>>>> Grant Ingersoll >>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New >>>>> Orleans. >>>>> http://www.lucenebootcamp.com >>>>> >>>>> >>>>> Lucene Helpful Hints: >>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>>>> http://wiki.apache.org/lucene-java/LuceneFAQ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> -------------------------- >>> Grant Ingersoll >>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. >>> http://www.lucenebootcamp.com >>> >>> >>> Lucene Helpful Hints: >>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>> http://wiki.apache.org/lucene-java/LuceneFAQ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> > > -------------------------- > Grant Ingersoll > Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. > http://www.lucenebootcamp.com > > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > -------------------------- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ