mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Problems with KMeans clustering
Date Tue, 28 Oct 2008 01:31:23 GMT
That is, I can reproduce the original problem.

On Oct 27, 2008, at 9:22 PM, Grant Ingersoll wrote:

> OK, I can reproduce this.
>
> On Oct 27, 2008, at 8:14 PM, Philippe Lamarche wrote:
>
>> I removed the apache-mahout-core-0.1-dev.jar file from {hadoop- 
>> home}/lib and
>> added apache-mahout-examples-0.1-dev.job
>>
>> my lib folder now contain :
>> -rw-r--r-- 1 hadoop hadoop 4506592 2008-10-27 19:59
>> apache-mahout-examples-0.1-dev.job
>> -rw-r--r-- 1 hadoop root    258337 2008-10-27 14:37
>> commons-cli-2.0-SNAPSHOT.jar
>> -rw-r--r-- 1 hadoop root     46725 2008-10-27 14:37 commons- 
>> codec-1.3.jar
>> -rw-r--r-- 1 hadoop root    279781 2008-10-27 14:37
>> commons-httpclient-3.0.1.jar
>> -rw-r--r-- 1 hadoop root     38015 2008-10-27 14:37
>> commons-logging-1.0.4.jar
>> -rw-r--r-- 1 hadoop root     26202 2008-10-27 14:37
>> commons-logging-api-1.0.4.jar
>> -rw-r--r-- 1 hadoop root    180792 2008-10-27 14:37 commons- 
>> net-1.4.1.jar
>> -rw-r--r-- 1 hadoop root    288534 2008-10-27 14:37 jets3t-0.6.0.jar
>> -rw-r--r-- 1 hadoop root    665638 2008-10-27 14:37 jetty-5.1.4.jar
>> -rw-r--r-- 1 hadoop root     11358 2008-10-27 14:37  
>> jetty-5.1.4.LICENSE.txt
>> drwxr-xr-x 2 hadoop root      4096 2008-10-27 14:37 jetty-ext
>> -rw-r--r-- 1 hadoop root    121070 2008-10-27 14:37 junit-3.8.1.jar
>> -rw-r--r-- 1 hadoop root     14999 2008-10-27 14:37  
>> junit-3.8.1.LICENSE.txt
>> -rw-r--r-- 1 hadoop root      9484 2008-10-27 14:37 kfs-0.1.3.jar
>> -rw-r--r-- 1 hadoop root     11358 2008-10-27 14:37  
>> kfs-0.1.LICENSE.txt
>> -rw-r--r-- 1 hadoop root    391834 2008-10-27 14:37 log4j-1.2.15.jar
>> drwxr-xr-x 4 hadoop root      4096 2008-10-27 14:37 native
>> -rw-r--r-- 1 hadoop root     65261 2008-10-27 14:37 oro-2.0.8.jar
>> -rw-r--r-- 1 hadoop root     97689 2008-10-27 14:37 servlet-api.jar
>> -rw-r--r-- 1 hadoop root     15345 2008-10-27 14:37 slf4j- 
>> api-1.4.3.jar
>> -rw-r--r-- 1 hadoop root      1159 2008-10-27 14:37 slf4j-LICENSE.txt
>> -rw-r--r-- 1 hadoop root      8601 2008-10-27 14:37 slf4j- 
>> log4j12-1.4.3.jar
>> -rw-r--r-- 1 hadoop root     15010 2008-10-27 14:37 xmlenc-0.52.jar
>>
>> when I try to run the synthetic example I get:
>>
>> hadoop@phil:/usr/local/hadoop$ bin/hadoop jar
>> /home/philippe/workspace/MahoutJava/examples/dist/apache-mahout- 
>> examples-0.1-dev.jar
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>> java.lang.NoClassDefFoundError: org/apache/mahout/matrix/Vector
>>   at
>> org 
>> .apache 
>> .mahout 
>> .clustering 
>> .syntheticcontrol.canopy.InputDriver.runJob(InputDriver.java:42)
>>   at
>> org 
>> .apache 
>> .mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:77)
>>   at
>> org 
>> .apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java: 
>> 44)
>>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>   at
>> sun 
>> .reflect 
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>   at
>> sun 
>> .reflect 
>> .DelegatingMethodAccessorImpl 
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>   at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>   at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>   at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>> Caused by: java.lang.ClassNotFoundException:  
>> org.apache.mahout.matrix.Vector
>>   at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>   at java.security.AccessController.doPrivileged(Native Method)
>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>   at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
>>   ... 12 more
>>
>> Right now, Hadoop doesn't have any additional classpath elements  
>> that I know
>> of, from conf/hadoop-env.sh or elsewhere.
>>
>> Did I understand correctly what you were saying?
>>
>> On Mon, Oct 27, 2008 at 7:29 PM, Grant Ingersoll  
>> <gsingers@apache.org>wrote:
>>
>>>
>>> On Oct 27, 2008, at 4:26 PM, Philippe Lamarche wrote:
>>>
>>> Hi,
>>>>
>>>> My goal is to run the example KMeans. I must download the synthetic
>>>> control
>>>> data and put it on the dfs in "testdata".
>>>>
>>>> To be sure that everything is ok, I stated form a clean state on my
>>>> laptop.
>>>>
>>>> I downloaded hadoop 0.18.1.
>>>>
>>>> I changed the conf/hadoop-site.xml to this:
>>>>
>>>> <?xml version="1.0"?>
>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>> <!-- Put site-specific property overrides in this file. -->
>>>> <configuration>
>>>> <property>
>>>> <name>hadoop.tmp.dir</name>
>>>> <value>/usr/local/hadoop-datastore/hadoop-${user.name}</value>
>>>> </property>
>>>> <property>
>>>> <name>fs.default.name</name>
>>>> <value>hdfs://localhost:9000</value>
>>>> </property>
>>>> <property>
>>>> <name>mapred.job.tracker</name>
>>>> <value>hdfs://localhost:9001</value>
>>>> </property>
>>>> <property>
>>>> <name>dfs.replication</name>
>>>> <value>1</value>
>>>> </property>
>>>> </configuration>
>>>>
>>>> I changed JAVA_HOME in hadoop-env.sh.
>>>>
>>>> I downloaded mahout from SVN, at revision 708282.
>>>>
>>>> I built both core and example from ant script.
>>>>
>>>> I copied apache-mahout-core-0.1-dev.jar to {hadoop-home}/lib.
>>>>
>>>
>>> What happens if you don't do this but use the "job" file instead  
>>> (ant job
>>> in the examples dir)?  I'm trying to replicate this, but am stuck  
>>> at the
>>> moment.
>>>
>>>
>>>
>>>>
>>>> I downloaded
>>>>
>>>> http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
>>>>
>>>> I added the file to the dfs:
>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop dfs -put
>>>> /home/philippe/synthetic_control.data testdata
>>>>
>>>> I ran the example jar, but it failed :
>>>>
>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop jar
>>>>
>>>> /home/philippe/workspace/MahoutJava/examples/dist/apache-mahout- 
>>>> examples-0.1-dev.jar
>>>>
>>>
>>>
>>>
>>>
>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>> 08/10/27 15:34:55 WARN mapred.JobClient: Use GenericOptionsParser  
>>>> for
>>>> parsing the arguments. Applications should implement Tool for the  
>>>> same.
>>>> 08/10/27 15:34:55 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 1
>>>> 08/10/27 15:34:55 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 1
>>>> 08/10/27 15:34:55 INFO mapred.JobClient: Running job:
>>>> job_200810271532_0001
>>>> 08/10/27 15:34:56 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/27 15:34:59 INFO mapred.JobClient: Job complete:
>>>> job_200810271532_0001
>>>> 08/10/27 15:34:59 INFO mapred.JobClient: Counters: 7
>>>> 08/10/27 15:34:59 INFO mapred.JobClient:   File Systems
>>>> 08/10/27 15:34:59 INFO mapred.JobClient:     HDFS bytes read=291644
>>>> 08/10/27 15:34:59 INFO mapred.JobClient:     HDFS bytes  
>>>> written=323660
>>>> 08/10/27 15:34:59 INFO mapred.JobClient:   Job Counters
>>>> 08/10/27 15:34:59 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/27 15:34:59 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/27 15:34:59 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/27 15:34:59 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/27 15:34:59 INFO mapred.JobClient:     Map input bytes=288374
>>>> 08/10/27 15:34:59 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/27 15:34:59 WARN mapred.JobClient: Use GenericOptionsParser  
>>>> for
>>>> parsing the arguments. Applications should implement Tool for the  
>>>> same.
>>>> 08/10/27 15:35:00 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/27 15:35:00 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/27 15:35:00 INFO mapred.JobClient: Running job:
>>>> job_200810271532_0002
>>>> 08/10/27 15:35:01 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/27 15:35:10 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/27 15:35:16 INFO mapred.JobClient: Job complete:
>>>> job_200810271532_0002
>>>> 08/10/27 15:35:16 INFO mapred.JobClient: Counters: 16
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:   File Systems
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     HDFS bytes read=323660
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     HDFS bytes  
>>>> written=1447
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Local bytes read=1389
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Local bytes  
>>>> written=37878
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:   Job Counters
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Launched reduce  
>>>> tasks=1
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Reduce input groups=1
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Combine output  
>>>> records=29
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Reduce output  
>>>> records=1
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Map output  
>>>> bytes=943020
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Combine input  
>>>> records=1760
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Map output  
>>>> records=1732
>>>> 08/10/27 15:35:16 INFO mapred.JobClient:     Reduce input records=1
>>>> 08/10/27 15:35:16 WARN mapred.JobClient: Use GenericOptionsParser  
>>>> for
>>>> parsing the arguments. Applications should implement Tool for the  
>>>> same.
>>>> 08/10/27 15:35:16 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/27 15:35:16 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/27 15:35:16 INFO mapred.JobClient: Running job:
>>>> job_200810271532_0003
>>>> 08/10/27 15:35:17 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/27 15:35:24 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/27 15:35:28 INFO mapred.JobClient: Job complete:
>>>> job_200810271532_0003
>>>> 08/10/27 15:35:28 INFO mapred.JobClient: Counters: 16
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:   File Systems
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     HDFS bytes read=326554
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     HDFS bytes  
>>>> written=1137260
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Local bytes  
>>>> read=1147358
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Local bytes  
>>>> written=2304490
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:   Job Counters
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Launched reduce  
>>>> tasks=1
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Reduce input groups=1
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Combine output  
>>>> records=0
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Reduce output  
>>>> records=600
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Map output  
>>>> bytes=1139660
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Combine input  
>>>> records=0
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/27 15:35:28 INFO mapred.JobClient:     Reduce input  
>>>> records=600
>>>> 08/10/27 15:35:28 INFO kmeans.KMeansDriver: Iteration 0
>>>> 08/10/27 15:35:29 WARN mapred.JobClient: Use GenericOptionsParser  
>>>> for
>>>> parsing the arguments. Applications should implement Tool for the  
>>>> same.
>>>> 08/10/27 15:35:29 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/27 15:35:29 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/27 15:35:29 INFO mapred.JobClient: Running job:
>>>> job_200810271532_0004
>>>> 08/10/27 15:35:30 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/27 15:35:37 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/27 15:35:45 INFO mapred.JobClient: Task Id :
>>>> attempt_200810271532_0004_r_000000_0, Status : FAILED
>>>> java.io.IOException: attempt_200810271532_0004_r_000000_0The  
>>>> reduce copier
>>>> failed
>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>> at
>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: 
>>>> 2207)
>>>>
>>>>
>>>> The failed attempts logs contain this:
>>>>
>>>> 008-10-27 15:35:40,133 INFO org.apache.hadoop.mapred.ReduceTask:
>>>> Shuffling 2524 bytes (2524 raw bytes) into RAM from
>>>> attempt_200810271532_0004_m_000000_0
>>>> 2008-10-27 15:35:40,134 INFO org.apache.hadoop.mapred.ReduceTask:  
>>>> Read
>>>> 2524 bytes from map-output for attempt_200810271532_0004_m_000000_0
>>>> 2008-10-27 15:35:40,134 INFO org.apache.hadoop.mapred.ReduceTask:  
>>>> Rec
>>>> #1 from attempt_200810271532_0004_m_000000_0 -> (1358, 1158) from  
>>>> phil
>>>> 2008-10-27 15:35:41,110 INFO org.apache.hadoop.mapred.ReduceTask:
>>>> Closed ram manager
>>>> 2008-10-27 15:35:41,125 INFO org.apache.hadoop.mapred.ReduceTask:
>>>> Interleaved on-disk merge complete: 0 files left.
>>>> 2008-10-27 15:35:41,173 INFO org.apache.hadoop.mapred.ReduceTask:
>>>> Initiating in-memory merge with 2 segments...
>>>> 2008-10-27 15:35:41,177 INFO org.apache.hadoop.mapred.Merger:  
>>>> Merging
>>>> 2 sorted segments
>>>> 2008-10-27 15:35:41,178 INFO org.apache.hadoop.mapred.Merger:  
>>>> Down to
>>>> the last merge-pass, with 2 segments left of total size: 5011 bytes
>>>> 2008-10-27 15:35:41,197 WARN org.apache.hadoop.mapred.ReduceTask:
>>>> attempt_200810271532_0004_r_000000_0 Merge of the inmemory files  
>>>> threw
>>>> an exception: java.io.IOException: Intermedate merge failed
>>>>      at
>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>>>> $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147)
>>>>      at
>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>>>> $InMemFSMergeThread.run(ReduceTask.java:2078)
>>>> Caused by: java.lang.NumberFormatException: For input string: "["
>>>>      at
>>>> sun 
>>>> .misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java: 
>>>> 1224)
>>>>      at java.lang.Double.parseDouble(Double.java:510)
>>>>      at
>>>> org 
>>>> .apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
>>>>      at
>>>> org 
>>>> .apache 
>>>> .mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256)
>>>>      at
>>>> org 
>>>> .apache 
>>>> .mahout 
>>>> .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38)
>>>>      at
>>>> org 
>>>> .apache 
>>>> .mahout 
>>>> .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)
>>>>      at
>>>> org.apache.hadoop.mapred.ReduceTask 
>>>> $ReduceCopier.combineAndSpill(ReduceTask.java:2174)
>>>>      at
>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access 
>>>> $3100(ReduceTask.java:341)
>>>>      at
>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>>>> $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)
>>>>      ... 1 more
>>>>
>>>> 2008-10-27 15:35:41,197 INFO org.apache.hadoop.mapred.ReduceTask:
>>>> In-memory merge complete: 0 files left.
>>>> 2008-10-27 15:35:41,198 WARN org.apache.hadoop.mapred.TaskTracker:
>>>> Error running child
>>>> java.io.IOException: attempt_200810271532_0004_r_000000_0The reduce
>>>> copier failed
>>>>      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java: 
>>>> 255)
>>>>      at
>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: 
>>>> 2207)
>>>>
>>>>
>>>>
>>>> However, I can run the org.apache.mahout.clustering.kmeans unit  
>>>> tests
>>>> without problems.
>>>>
>>>> I truly do not understand where the problems lies.
>>>> Thanks for the help.
>>>>
>>>>
>>>> On Sun, Oct 26, 2008 at 8:24 PM, Grant Ingersoll <gsingers@apache.org
>>>>> wrote:
>>>>
>>>> Same Mahout code, though, right?
>>>>>
>>>>> Can you provide details on how you were running it?
>>>>>
>>>>>
>>>>> On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote:
>>>>>
>>>>> Unfortunately, I went straight from 0.17.2 to 0.18.1.  It was  
>>>>> working on
>>>>>
>>>>>> 0.17.2.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <gsingers@apache.org
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>
>>>>>> Did this work with 0.18.0 or other prior versions for you?
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>>> I just updated to hadoop 0.18.1 and got a clean version of
 
>>>>>>>> mahout from
>>>>>>>> svn.
>>>>>>>> However, I am having problems with KMeans, that can be traced
 
>>>>>>>> down to
>>>>>>>> :
>>>>>>>>
>>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:
 
>>>>>>>> Merging
>>>>>>>> 2 sorted segments
>>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:
 
>>>>>>>> Down to
>>>>>>>> the last merge-pass, with 2 segments left of total size:
5011  
>>>>>>>> bytes
>>>>>>>> 2008-10-25 19:10:16,999 WARN  
>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>> attempt_200810251826_0013_r_000000_0 Merge of the inmemory
 
>>>>>>>> files threw
>>>>>>>> an exception: java.io.IOException: Intermedate merge failed
>>>>>>>>  at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>>>>>>>> $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147)
>>>>>>>>  at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>>>>>>>> $InMemFSMergeThread.run(ReduceTask.java:2078)
>>>>>>>> Caused by: java.lang.NumberFormatException: For input string:
 
>>>>>>>> "["
>>>>>>>>  at
>>>>>>>>
>>>>>>>> sun 
>>>>>>>> .misc 
>>>>>>>> .FloatingDecimal.readJavaFormatString(FloatingDecimal.java:

>>>>>>>> 1224)
>>>>>>>>  at java.lang.Double.parseDouble(Double.java:510)
>>>>>>>>  at
>>>>>>>> org 
>>>>>>>> .apache 
>>>>>>>> .mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
>>>>>>>>  at
>>>>>>>>
>>>>>>>>
>>>>>>>> org 
>>>>>>>> .apache 
>>>>>>>> .mahout 
>>>>>>>> .matrix.AbstractVector.decodeVector(AbstractVector.java:256)
>>>>>>>>  at
>>>>>>>>
>>>>>>>>
>>>>>>>> org 
>>>>>>>> .apache 
>>>>>>>> .mahout 
>>>>>>>> .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:

>>>>>>>> 38)
>>>>>>>>  at
>>>>>>>>
>>>>>>>>
>>>>>>>> org 
>>>>>>>> .apache 
>>>>>>>> .mahout 
>>>>>>>> .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:

>>>>>>>> 31)
>>>>>>>>  at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.hadoop.mapred.ReduceTask 
>>>>>>>> $ReduceCopier.combineAndSpill(ReduceTask.java:2174)
>>>>>>>>  at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access 
>>>>>>>> $3100(ReduceTask.java:341)
>>>>>>>>  at
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>>>>>>>> $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)
>>>>>>>>  ... 1 more
>>>>>>>>
>>>>>>>> 2008-10-25 19:10:16,999 INFO  
>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>> In-memory merge complete: 0 files left.
>>>>>>>> 2008-10-25 19:10:17,000 WARN  
>>>>>>>> org.apache.hadoop.mapred.TaskTracker:
>>>>>>>> Error running child
>>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The
 
>>>>>>>> reduce
>>>>>>>> copier failed
>>>>>>>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:

>>>>>>>> 255)
>>>>>>>>  at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker 
>>>>>>>> $Child.main(TaskTracker.java:2207)
>>>>>>>>
>>>>>>>>
>>>>>>>> This is while running the synthetic_control.data example,
but  
>>>>>>>> I have
>>>>>>>> the
>>>>>>>> same problems with any other input data.
>>>>>>>>
>>>>>>>> I am able to do other map-reduce job without problems.
>>>>>>>>
>>>>>>>> Here is the output of the jar task:
>>>>>>>>
>>>>>>>> hadoop@philippe-vaio:/usr/local/hadoop$ bin/hadoop jar
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/philippe/workspace/MahoutJava/examples/dist/apache-

>>>>>>>> mahout-examples-0.1-dev.jar
>>>>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>>>>>> 08/10/25 19:09:27 WARN mapred.JobClient: Use  
>>>>>>>> GenericOptionsParser for
>>>>>>>> parsing the arguments. Applications should implement Tool
for  
>>>>>>>> the
>>>>>>>> same.
>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input
 
>>>>>>>> paths to
>>>>>>>> process
>>>>>>>> : 1
>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input
 
>>>>>>>> paths to
>>>>>>>> process
>>>>>>>> : 1
>>>>>>>> 08/10/25 19:09:28 INFO mapred.JobClient: Running job:
>>>>>>>> job_200810251826_0010
>>>>>>>> 08/10/25 19:09:29 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>> 08/10/25 19:09:31 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job complete:
>>>>>>>> job_200810251826_0010
>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7
>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   File Systems
>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes 

>>>>>>>> read=291644
>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes 

>>>>>>>> written=323660
>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Job Counters
>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Launched map
 
>>>>>>>> tasks=2
>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Data-local map
 
>>>>>>>> tasks=2
>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input  
>>>>>>>> records=600
>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input  
>>>>>>>> bytes=288374
>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map output 

>>>>>>>> records=600
>>>>>>>> 08/10/25 19:09:32 WARN mapred.JobClient: Use  
>>>>>>>> GenericOptionsParser for
>>>>>>>> parsing the arguments. Applications should implement Tool
for  
>>>>>>>> the
>>>>>>>> same.
>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input
 
>>>>>>>> paths to
>>>>>>>> process
>>>>>>>> : 2
>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input
 
>>>>>>>> paths to
>>>>>>>> process
>>>>>>>> : 2
>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Running job:
>>>>>>>> job_200810251826_0011
>>>>>>>> 08/10/25 19:09:33 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>> 08/10/25 19:09:37 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>> 08/10/25 19:09:39 INFO mapred.JobClient:  map 100% reduce
0%
>>>>>>>> 08/10/25 19:09:44 INFO mapred.JobClient:  map 100% reduce
16%
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job complete:
>>>>>>>> job_200810251826_0011
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   File Systems
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes 

>>>>>>>> read=323660
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes 

>>>>>>>> written=1447
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
 
>>>>>>>> read=1389
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
 
>>>>>>>> written=37878
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Job Counters
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched reduce
 
>>>>>>>> tasks=1
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched map
 
>>>>>>>> tasks=2
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Data-local map
 
>>>>>>>> tasks=2
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce input
 
>>>>>>>> groups=1
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine output
 
>>>>>>>> records=29
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input  
>>>>>>>> records=600
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce output
 
>>>>>>>> records=1
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output 

>>>>>>>> bytes=943020
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input  
>>>>>>>> bytes=323660
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine input
>>>>>>>> records=1760
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output 

>>>>>>>> records=1732
>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce input
 
>>>>>>>> records=1
>>>>>>>> 08/10/25 19:09:53 WARN mapred.JobClient: Use  
>>>>>>>> GenericOptionsParser for
>>>>>>>> parsing the arguments. Applications should implement Tool
for  
>>>>>>>> the
>>>>>>>> same.
>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input
 
>>>>>>>> paths to
>>>>>>>> process
>>>>>>>> : 2
>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input
 
>>>>>>>> paths to
>>>>>>>> process
>>>>>>>> : 2
>>>>>>>> 08/10/25 19:09:53 INFO mapred.JobClient: Running job:
>>>>>>>> job_200810251826_0012
>>>>>>>> 08/10/25 19:09:54 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>> 08/10/25 19:09:56 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>> 08/10/25 19:09:58 INFO mapred.JobClient:  map 100% reduce
0%
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job complete:
>>>>>>>> job_200810251826_0012
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   File Systems
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes 

>>>>>>>> read=326554
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>> written=1137260
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
 
>>>>>>>> read=1147358
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
>>>>>>>> written=2304490
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Job Counters
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched reduce
 
>>>>>>>> tasks=1
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched map
 
>>>>>>>> tasks=2
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Data-local map
 
>>>>>>>> tasks=2
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce input
 
>>>>>>>> groups=1
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine output
 
>>>>>>>> records=0
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input  
>>>>>>>> records=600
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce output
 
>>>>>>>> records=600
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output 

>>>>>>>> bytes=1139660
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input  
>>>>>>>> bytes=323660
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine input
 
>>>>>>>> records=0
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output 

>>>>>>>> records=600
>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce input
 
>>>>>>>> records=600
>>>>>>>> 08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0
>>>>>>>> 08/10/25 19:10:02 WARN mapred.JobClient: Use  
>>>>>>>> GenericOptionsParser for
>>>>>>>> parsing the arguments. Applications should implement Tool
for  
>>>>>>>> the
>>>>>>>> same.
>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input
 
>>>>>>>> paths to
>>>>>>>> process
>>>>>>>> : 2
>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input
 
>>>>>>>> paths to
>>>>>>>> process
>>>>>>>> : 2
>>>>>>>> 08/10/25 19:10:03 INFO mapred.JobClient: Running job:
>>>>>>>> job_200810251826_0013
>>>>>>>> 08/10/25 19:10:04 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>> 08/10/25 19:10:08 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>> 08/10/25 19:10:09 INFO mapred.JobClient:  map 100% reduce
0%
>>>>>>>> 08/10/25 19:10:21 INFO mapred.JobClient: Task Id :
>>>>>>>> attempt_200810251826_0013_r_000000_0, Status : FAILED
>>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The
 
>>>>>>>> reduce
>>>>>>>> copier
>>>>>>>> failed
>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>> at
>>>>>>>> org.apache.hadoop.mapred.TaskTracker 
>>>>>>>> $Child.main(TaskTracker.java:2207)
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not sure if I am doing something wrong here.
>>>>>>>>
>>>>>>>> Thanks for the help,
>>>>>>>>
>>>>>>>> Philippe.
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------
>>>>>>> Grant Ingersoll
>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New  
>>>>>>> Orleans.
>>>>>>> http://www.lucenebootcamp.com
>>>>>>>
>>>>>>>
>>>>>>> Lucene Helpful Hints:
>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------
>>>>> Grant Ingersoll
>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New  
>>>>> Orleans.
>>>>> http://www.lucenebootcamp.com
>>>>>
>>>>>
>>>>> Lucene Helpful Hints:
>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>> --------------------------
>>> Grant Ingersoll
>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>> http://www.lucenebootcamp.com
>>>
>>>
>>> Lucene Helpful Hints:
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
> --------------------------
> Grant Ingersoll
> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
> http://www.lucenebootcamp.com
>
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>

--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ










Mime
View raw message