mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Problems with KMeans clustering
Date Thu, 06 Nov 2008 13:03:30 GMT
I think that would be good.  I'm going to be working on MAHOUT today  
and tomorrow, hopefully.  Finally have some free time at ApacheCon...

On Nov 5, 2008, at 11:08 PM, Palleti, Pallavi wrote:

> Hi all,
>
> The same is discussed here:
> https://issues.apache.org/jira/browse/MAHOUT-79
>
> I have patch for fixing this issue ready. If no one is working on  
> it, I
> can open an issue in jira and commit the same.
>
> Thanks
> Pallavi
>
> -----Original Message-----
> From: Jeff Eastman [mailto:jdog@windwardsolutions.com]
> Sent: Thursday, November 06, 2008 5:50 AM
> To: mahout-user@lucene.apache.org
> Subject: Re: Problems with KMeans clustering
>
> Thanks Steve,
>
> That was a subtle change that was evidently made after Kmeans was
> implemented and did not show up until later when people such as  
> Philippe
>
> and yourself ran it with real problems on real clusters. While the  
> type
> signatures of the reducer and combiner are in fact the same, the  
> values
> provided by the mapper and combiner are different and could indeed
> create the odd behavior that was reported.
>
> The algorithm's dependence upon run-once behavior is pretty  
> fundamental,
>
> since summing of cluster centroids is done in the combiner and the
> reducer does a merge of those clusters. I'd be interested in exactly  
> how
>
> you resolved this.
>
> It likely applies to some of the other clustering implementations too.
>
> Finally, can you explain why this problem no longer seems to occur  
> with
> Hadoop trunk?
>
> Jeff
>
>
> Steve Schlosser wrote:
>> Hi folks
>>
>> A while back we upgraded our Hadoop cluster from 0.15 to 0.18.0,  
>> and I
>> found that Mahout Kmeans quit working.  I finally tracked it down to
>> the fact that the semantics of the combiner changed between 0.16,
>> 0.17, and 0.18 from run exactly once to run zero or more times (which
>> is in line with how Map/Reduce was originally specified).  See:
>> https://issues.apache.org/jira/browse/HADOOP-3586.
>>
>> The Kmeans combiner depended on running exactly once, but on our new
>> cluster it was running multiple times, causing hard-to-discern  
>> errors.
>> Basically, the second time through the Combiner, it would throw an
>> exception that the formatting of the vector (serialized into a Text)
>> was failing.  In the end, I had to make some formatting changes to  
>> the
>> data output by the Mapper and the Combiner to match what the Reducer
>> expects, as well as changes to the Combiner input to .  I ended up
>> having to hack the Mapper to output vectors that either the Combiner
>> or Reducer could take as input, and make the Combiner take in the  
>> same
>> input that it outputs and to calculate convergence at each step.
>>
>> My apologies if this has already been covered and put to rest - I  
>> just
>> happened upon this thread this afternoon.
>>
>> -steve
>>
>> On Sun, Nov 2, 2008 at 10:29 AM, Philippe Lamarche
>> <philippe.lamarche@gmail.com> wrote:
>>
>>> Hi there,
>>> It also works on 0.19.0-dev, that is on hadoop/branches/branch-0.19.
>>>
>>> I intend in the next few day to try to find out what exactly is the
> problem
>>> to make sure that it won't come back in a few revisions.
>>>
>>> Thanks!
>>>
>>> On Thu, Oct 30, 2008 at 9:20 AM, Grant Ingersoll
> <gsingers@apache.org>wrote:
>>>
>>>
>>>> Hmm, I believe that patch has been applied in 18.2 (whatever that
> is) but
>>>> it also looks like it has been applied to 0.17.3 branch as well.
> So, it
>>>> might be something else that "fixed" it.
>>>>
>>>> At any rate, glad to hear it works on trunk.
>>>>
>>>>
>>>> On Oct 29, 2008, at 6:38 PM, Philippe Lamarche wrote:
>>>>
>>>> I am not sure I understand the hadoop svn structure, however I was
> able to
>>>>
>>>>> make it work with hadoop trunk, or 0.20.0-dev.
>>>>> It didn't work with hadoop/branch-0.18, with or without patch  
>>>>> 4277.
>>>>>
>>>>>
>>>>> Here is a copy-paste of the steps, once Hadoop is built and
> installed.  I
>>>>> am
>>>>> using the same exact "apache-mahout-examples-0.1-dev.job", not
> rebuilt
>>>>> with
>>>>> the 0.20.0-dev jars.
>>>>>
>>>>> It works!
>>>>>
>>>>> That would mean that the bug/feature is not related to
>>>>> HADOOP-4277<http://issues.apache.org/jira/browse/HADOOP-4277>,
>>>>>
>>>>> and was reintroduced (or never took away) in hadoop/trunk.
>>>>>
>>>>>
>>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop namenode -format
>>>>> 08/10/29 18:27:59 INFO namenode.NameNode: STARTUP_MSG:
>>>>> /************************************************************
>>>>> STARTUP_MSG: Starting NameNode
>>>>> STARTUP_MSG:   host = phil/127.0.1.1
>>>>> STARTUP_MSG:   args = [-format]
>>>>> STARTUP_MSG:   version = 0.20.0-dev
>>>>> STARTUP_MSG:   build =  -r ; compiled by 'philippe' on Wed Oct 29
> 18:25:08
>>>>> EDT 2008
>>>>> ************************************************************/
>>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem:  
>>>>> fsOwner=hadoop,hadoop
>>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem:  
>>>>> supergroup=supergroup
>>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem:
> isPermissionEnabled=true
>>>>> 08/10/29 18:28:00 INFO common.Storage: Image file of size 96 saved
> in 0
>>>>> seconds.
>>>>> 08/10/29 18:28:00 INFO common.Storage: Storage directory
>>>>> /usr/local/hadoop-datastore/hadoop-hadoop/dfs/name has been
> successfully
>>>>> formatted.
>>>>> 08/10/29 18:28:00 INFO namenode.NameNode: SHUTDOWN_MSG:
>>>>> /************************************************************
>>>>> SHUTDOWN_MSG: Shutting down NameNode at phil/127.0.1.1
>>>>> ************************************************************/
>>>>>
>>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop dfs -put
>>>>> /home/philippe/synthetic_control.data testdata
>>>>>
>>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop jar
>>>>>
>>>>>
> /home/philippe/workspace/MahoutJava/examples/build/apache-mahout- 
> example
> s-0.1-dev.job
>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>>> 08/10/29 18:28:45 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:28:46 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 1
>>>>> 08/10/29 18:28:47 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0002
>>>>> 08/10/29 18:28:48 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:28:54 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:28:55 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0002
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Counters: 7
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=291644
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes
> written=323660
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input  
>>>>> bytes=288374
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:28:56 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:28:56 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0003
>>>>> 08/10/29 18:28:57 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:29:03 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:29:05 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:29:10 INFO mapred.JobClient:  map 100% reduce 100%
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0003
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=323660
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes
> written=9657
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes  
>>>>> read=36119
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes
> written=72300
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input groups=1
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine output
> records=28
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce output
> records=7
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output
> bytes=943020
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine input
> records=1732
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output
> records=1732
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input
> records=28
>>>>> 08/10/29 18:29:11 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:29:11 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:29:12 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0004
>>>>> 08/10/29 18:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:29:20 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:29:22 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:29:27 INFO mapred.JobClient:  map 100% reduce 100%
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0004
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=342974
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes
> written=3002539
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes
> read=3018455
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes
> written=6036972
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input groups=7
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine output
> records=0
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce output
> records=1591
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output
> bytes=3008903
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine input
> records=0
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output
> records=1591
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input
> records=1591
>>>>> 08/10/29 18:29:28 INFO kmeans.KMeansDriver: Iteration 0
>>>>> 08/10/29 18:29:28 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:29:28 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0005
>>>>> 08/10/29 18:29:29 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:29:35 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:29:37 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0005
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=342974
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes
> written=8205
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes  
>>>>> read=23227
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes
> written=46516
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input groups=7
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine output
> records=10
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce output
> records=7
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output
> bytes=1136504
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine input
> records=600
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input
> records=10
>>>>> 08/10/29 18:29:41 INFO kmeans.KMeansDriver: Iteration 1
>>>>> 08/10/29 18:29:41 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:29:41 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:29:42 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0006
>>>>> 08/10/29 18:29:43 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:29:50 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:29:51 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:29:55 INFO mapred.JobClient:  map 100% reduce 100%
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0006
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=340070
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes
> written=8242
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes  
>>>>> read=21265
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes
> written=42592
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input groups=7
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine output
> records=10
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce output
> records=7
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output
> bytes=1023966
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine input
> records=600
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input
> records=10
>>>>> 08/10/29 18:29:56 INFO kmeans.KMeansDriver: Iteration 2
>>>>> 08/10/29 18:29:56 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:29:56 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0007
>>>>> 08/10/29 18:29:57 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:30:03 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:30:05 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0007
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=340144
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes
> written=8280
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes  
>>>>> read=21085
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes
> written=42232
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input groups=7
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine output
> records=10
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce output
> records=7
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output
> bytes=1023681
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine input
> records=600
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input
> records=10
>>>>> 08/10/29 18:30:09 INFO kmeans.KMeansDriver: Iteration 3
>>>>> 08/10/29 18:30:09 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:30:09 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0008
>>>>> 08/10/29 18:30:10 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:30:17 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:30:18 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:30:22 INFO mapred.JobClient:  map 100% reduce 100%
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0008
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=340220
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes
> written=8250
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes  
>>>>> read=21339
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes
> written=42740
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input groups=7
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine output
> records=10
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce output
> records=7
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output
> bytes=1028419
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine input
> records=600
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input
> records=10
>>>>> 08/10/29 18:30:23 INFO kmeans.KMeansDriver: Iteration 4
>>>>> 08/10/29 18:30:23 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:30:23 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:30:24 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0009
>>>>> 08/10/29 18:30:25 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:30:31 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:30:33 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:30:37 INFO mapred.JobClient:  map 100% reduce 100%
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0009
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=340160
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes
> written=8200
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes  
>>>>> read=21219
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes
> written=42500
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input groups=7
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine output
> records=10
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce output
> records=7
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output
> bytes=1024899
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine input
> records=600
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input
> records=10
>>>>> 08/10/29 18:30:38 INFO kmeans.KMeansDriver: Clustering
>>>>> 08/10/29 18:30:38 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:30:38 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0010
>>>>> 08/10/29 18:30:39 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:30:45 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0010
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient: Counters: 7
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=340060
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes
> written=1020535
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:30:47 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:30:47 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:30:48 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0011
>>>>> 08/10/29 18:30:49 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:30:56 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0011
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient: Counters: 7
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes
> read=1020535
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes
> written=325460
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input
> bytes=1020535
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 29, 2008 at 11:10 AM, Philippe Lamarche <
>>>>> philippe.lamarche@gmail.com> wrote:
>>>>>
>>>>> I will!
>>>>>
>>>>>> On 10/29/08, Grant Ingersoll <gsingers@apache.org> wrote:
>>>>>>
>>>>>>
>>>>>>> Philippe, can you try the patch suggested by Arun Murthy on
>>>>>>> core-user@hadoop.a.o?  See
>>>>>>> http://issues.apache.org/jira/browse/HADOOP-4277
>>>>>>>
>>>>>>> I'm pretty swamped at the moment w/ ApacheCon coming up next
> week, but
>>>>>>> if
>>>>>>> it does fix the issue, then maybe we should move forward to the
> 18.2
>>>>>>> candidate (I don't think it has been released yet, those guys
> have a
>>>>>>> pretty
>>>>>>> sophisticated build process going)
>>>>>>>
>>>>>>> -Grant
>>>>>>>
>>>>>>> On Oct 28, 2008, at 7:19 AM, Philippe Lamarche wrote:
>>>>>>>
>>>>>>> Ubuntu linux 2.6.24 <http://2.6.24.21>, with java-6- 
>>>>>>> sun-1.6.0.07.
>>>>>>>
>>>>>>>
>>>>>>>> On Tue, Oct 28, 2008 at 7:03 AM, Grant Ingersoll
> <gsingers@apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Just a single machine.  I didn't think we were using features
> either.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Are
>>>>>>>>> you saying you can run the example using 0.18.1?
>>>>>>>>>
>>>>>>>>> BTW, Philippe, what JVM, O/S, etc. are you using?
>>>>>>>>>
>>>>>>>>> -Grant
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 27, 2008, at 11:55 PM, Jeff Eastman wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Are you guys running on real Hadoop arrays? I can run the
> synthetic
>>>>>>>>>> control example just fine on a single machine. That code is
> just
>>>>>>>>>> trying
>>>>>>>>>> to
>>>>>>>>>> read a vector from a string. I'd be surprised if we were  
>>>>>>>>>> using
> any
>>>>>>>>>> "features" but will watch the threads.
>>>>>>>>>>
>>>>>>>>>> Jeff
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Grant Ingersoll wrote:
>>>>>>>>>>
>>>>>>>>>> I started a thread on core-user@hadoop.a.o:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> http://hadoop.markmail.org/message/cczunzfhpcqz6pis
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 27, 2008, at 9:49 PM, Grant Ingersoll wrote:
>>>>>>>>>>>
>>>>>>>>>>> OK, I can confirm that the exact same code works with 0.17.2
> and not
>>>>>>>>>>> w/
>>>>>>>>>>>
>>>>>>>>>>> 0.18.1.  So, it sounds like a bug in Hadoop, or we are
> relying on
>>>>>>>>>>>
>>>>>>>>>>>> incorrect behavior in Hadoop.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 27, 2008, at 9:33 PM, Grant Ingersoll wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, I went straight from 0.17.2 to 0.18.1.  It
> was
>>>>>>>>>>>>> working
>>>>>>>>>>>>>
>>>>>>>>>>>>> on
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 0.17.2.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> BTW, are you saying the same exact code was working on
> 0.17.2 or
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> are
>>>>>>>>>>>>> you referring to some older Mahout code that worked on
> 17.2?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <
>>>>>>>>>>>>>
>>>>>>>>>>>>>> gsingers@apache.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Did this work with 0.18.0 or other prior versions for  
>>>>>>>>>>>>>> you?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I just updated to hadoop 0.18.1 and got a clean version
> of
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> mahout
>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>> svn.
>>>>>>>>>>>>>>>> However, I am having problems with KMeans, that can be
> traced
>>>>>>>>>>>>>>>> down
>>>>>>>>>>>>>>>> to :
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO
> org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>>>> Merging
>>>>>>>>>>>>>>>> 2 sorted segments
>>>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO
> org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>>>> Down
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> the last merge-pass, with 2 segments left of total  
>>>>>>>>>>>>>>>> size:
> 5011
>>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 WARN
>>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0 Merge of the
> inmemory
>>>>>>>>>>>>>>>> files
>>>>>>>>>>>>>>>> threw
>>>>>>>>>>>>>>>> an exception: java.io.IOException: Intermedate merge
> failed
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
> $InMemFSMergeThread.doIn
> MemMerge(ReduceTask.java:2147)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
> $InMemFSMergeThread.run(
> ReduceTask.java:2078)
>>>>>>>>>>>>>>>> Caused by: java.lang.NumberFormatException: For input
> string:
>>>>>>>>>>>>>>>> "["
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java: 
> 1224)
>>>>>>>>>>>>>>>> at java.lang.Double.parseDouble(Double.java:510)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org 
> .apache.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java
> :256)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org 
> .apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner
> .java:38)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org 
> .apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner
> .java:31)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.ReduceTask 
> $ReduceCopier.combineAndSpill(ReduceT
> ask.java:2174)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access 
> $3100(ReduceTask.
> java:341)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
> $InMemFSMergeThread.doIn
> MemMerge(ReduceTask.java:2134)
>>>>>>>>>>>>>>>> ... 1 more
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 INFO
>>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>>>> In-memory merge complete: 0 files left.
>>>>>>>>>>>>>>>> 2008-10-25 19:10:17,000 WARN
>>>>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>>>>>>>> Error running child
>>>>>>>>>>>>>>>> java.io.IOException:
> attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>>> copier failed
>>>>>>>>>>>>>>>> at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is while running the synthetic_control.data
> example, but I
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same problems with any other input data.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am able to do other map-reduce job without problems.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here is the output of the jar task:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> hadoop@philippe-vaio:/usr/local/hadoop$ bin/hadoop jar
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> /home/philippe/workspace/MahoutJava/examples/dist/apache-mahout- 
> examples
> -0.1-dev.jar
>>>>>>>>>>>>>>>> org 
>>>>>>>>>>>>>>>> .apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>>>>>>>>>>>>>> 08/10/25 19:09:27 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
> Tool for
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>>>> 08/10/25 19:09:29 INFO mapred.JobClient:  map 0% reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:31 INFO mapred.JobClient:  map 50%  
>>>>>>>>>>>>>>>> reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>>> read=291644
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>>> written=323660
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Launched
> map
>>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Data-local
> map
>>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Map-Reduce
> Framework
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>>> bytes=288374
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
> Tool for
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>>>> 08/10/25 19:09:33 INFO mapred.JobClient:  map 0% reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:37 INFO mapred.JobClient:  map 50%  
>>>>>>>>>>>>>>>> reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:39 INFO mapred.JobClient:  map 100%
> reduce 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:44 INFO mapred.JobClient:  map 100%
> reduce 16%
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>>> read=323660
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>>> written=1447
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local  
>>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>>> read=1389
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local  
>>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>>> written=37878
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched
> reduce
>>>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched
> map
>>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Data-local
> map
>>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Map-Reduce
> Framework
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce
> input
>>>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine
> output
>>>>>>>>>>>>>>>> records=29
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce
> output
>>>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>>> bytes=943020
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine
> input
>>>>>>>>>>>>>>>> records=1760
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>>> records=1732
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce
> input
>>>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>>>> 08/10/25 19:09:53 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
> Tool for
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>>>> 08/10/25 19:09:54 INFO mapred.JobClient:  map 0% reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:56 INFO mapred.JobClient:  map 50%  
>>>>>>>>>>>>>>>> reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:58 INFO mapred.JobClient:  map 100%
> reduce 0%
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>>> read=326554
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>>> written=1137260
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local  
>>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>>> read=1147358
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local  
>>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>>> written=2304490
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched
> reduce
>>>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched
> map
>>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Data-local
> map
>>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Map-Reduce
> Framework
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce
> input
>>>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine
> output
>>>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce
> output
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>>> bytes=1139660
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine
> input
>>>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce
> input
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
> Tool for
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>>> 08/10/25 19:10:03 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>>> job_200810251826_0013
>>>>>>>>>>>>>>>> 08/10/25 19:10:04 INFO mapred.JobClient:  map 0% reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:10:08 INFO mapred.JobClient:  map 50%  
>>>>>>>>>>>>>>>> reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:10:09 INFO mapred.JobClient:  map 100%
> reduce 0%
>>>>>>>>>>>>>>>> 08/10/25 19:10:21 INFO mapred.JobClient: Task Id :
>>>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0, Status : FAILED
>>>>>>>>>>>>>>>> java.io.IOException:
> attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>>> copier
>>>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>>> at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am not sure if I am doing something wrong here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Philippe.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US
> New
>>>>>>>>>>>>>>> Orleans.
>>>>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
>>>>>>>>>>>>> Orleans.
>>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
> Orleans.
>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
> Orleans.
>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --------------------------
>>>>>>>>>>>
>>>>>>>>> Grant Ingersoll
>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
> Orleans.
>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------
>>>>>>>>>
>>>>>>> Grant Ingersoll
>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
> Orleans.
>>>>>>> http://www.lucenebootcamp.com
>>>>>>>
>>>>>>>
>>>>>>> Lucene Helpful Hints:
>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>

Mime
View raw message