mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Palleti, Pallavi" <pallavi.pall...@corp.aol.com>
Subject RE: Problems with KMeans clustering
Date Thu, 06 Nov 2008 04:08:15 GMT
Hi all,

 The same is discussed here:
https://issues.apache.org/jira/browse/MAHOUT-79

I have patch for fixing this issue ready. If no one is working on it, I
can open an issue in jira and commit the same.

Thanks
Pallavi

-----Original Message-----
From: Jeff Eastman [mailto:jdog@windwardsolutions.com] 
Sent: Thursday, November 06, 2008 5:50 AM
To: mahout-user@lucene.apache.org
Subject: Re: Problems with KMeans clustering

Thanks Steve,

That was a subtle change that was evidently made after Kmeans was 
implemented and did not show up until later when people such as Philippe

and yourself ran it with real problems on real clusters. While the type 
signatures of the reducer and combiner are in fact the same, the values 
provided by the mapper and combiner are different and could indeed 
create the odd behavior that was reported.

The algorithm's dependence upon run-once behavior is pretty fundamental,

since summing of cluster centroids is done in the combiner and the 
reducer does a merge of those clusters. I'd be interested in exactly how

you resolved this.

It likely applies to some of the other clustering implementations too.

Finally, can you explain why this problem no longer seems to occur with 
Hadoop trunk?

Jeff


Steve Schlosser wrote:
> Hi folks
>
> A while back we upgraded our Hadoop cluster from 0.15 to 0.18.0, and I
> found that Mahout Kmeans quit working.  I finally tracked it down to
> the fact that the semantics of the combiner changed between 0.16,
> 0.17, and 0.18 from run exactly once to run zero or more times (which
> is in line with how Map/Reduce was originally specified).  See:
> https://issues.apache.org/jira/browse/HADOOP-3586.
>
> The Kmeans combiner depended on running exactly once, but on our new
> cluster it was running multiple times, causing hard-to-discern errors.
>  Basically, the second time through the Combiner, it would throw an
> exception that the formatting of the vector (serialized into a Text)
> was failing.  In the end, I had to make some formatting changes to the
> data output by the Mapper and the Combiner to match what the Reducer
> expects, as well as changes to the Combiner input to .  I ended up
> having to hack the Mapper to output vectors that either the Combiner
> or Reducer could take as input, and make the Combiner take in the same
> input that it outputs and to calculate convergence at each step.
>
> My apologies if this has already been covered and put to rest - I just
> happened upon this thread this afternoon.
>
> -steve
>
> On Sun, Nov 2, 2008 at 10:29 AM, Philippe Lamarche
> <philippe.lamarche@gmail.com> wrote:
>   
>> Hi there,
>> It also works on 0.19.0-dev, that is on hadoop/branches/branch-0.19.
>>
>> I intend in the next few day to try to find out what exactly is the
problem
>> to make sure that it won't come back in a few revisions.
>>
>> Thanks!
>>
>> On Thu, Oct 30, 2008 at 9:20 AM, Grant Ingersoll
<gsingers@apache.org>wrote:
>>
>>     
>>> Hmm, I believe that patch has been applied in 18.2 (whatever that
is) but
>>> it also looks like it has been applied to 0.17.3 branch as well.
So, it
>>> might be something else that "fixed" it.
>>>
>>> At any rate, glad to hear it works on trunk.
>>>
>>>
>>> On Oct 29, 2008, at 6:38 PM, Philippe Lamarche wrote:
>>>
>>>  I am not sure I understand the hadoop svn structure, however I was
able to
>>>       
>>>> make it work with hadoop trunk, or 0.20.0-dev.
>>>> It didn't work with hadoop/branch-0.18, with or without patch 4277.
>>>>
>>>>
>>>> Here is a copy-paste of the steps, once Hadoop is built and
installed.  I
>>>> am
>>>> using the same exact "apache-mahout-examples-0.1-dev.job", not
rebuilt
>>>> with
>>>> the 0.20.0-dev jars.
>>>>
>>>> It works!
>>>>
>>>> That would mean that the bug/feature is not related to
>>>> HADOOP-4277<http://issues.apache.org/jira/browse/HADOOP-4277>,
>>>>
>>>> and was reintroduced (or never took away) in hadoop/trunk.
>>>>
>>>>
>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop namenode -format
>>>> 08/10/29 18:27:59 INFO namenode.NameNode: STARTUP_MSG:
>>>> /************************************************************
>>>> STARTUP_MSG: Starting NameNode
>>>> STARTUP_MSG:   host = phil/127.0.1.1
>>>> STARTUP_MSG:   args = [-format]
>>>> STARTUP_MSG:   version = 0.20.0-dev
>>>> STARTUP_MSG:   build =  -r ; compiled by 'philippe' on Wed Oct 29
18:25:08
>>>> EDT 2008
>>>> ************************************************************/
>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: supergroup=supergroup
>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem:
isPermissionEnabled=true
>>>> 08/10/29 18:28:00 INFO common.Storage: Image file of size 96 saved
in 0
>>>> seconds.
>>>> 08/10/29 18:28:00 INFO common.Storage: Storage directory
>>>> /usr/local/hadoop-datastore/hadoop-hadoop/dfs/name has been
successfully
>>>> formatted.
>>>> 08/10/29 18:28:00 INFO namenode.NameNode: SHUTDOWN_MSG:
>>>> /************************************************************
>>>> SHUTDOWN_MSG: Shutting down NameNode at phil/127.0.1.1
>>>> ************************************************************/
>>>>
>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop dfs -put
>>>> /home/philippe/synthetic_control.data testdata
>>>>
>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop jar
>>>>
>>>>
/home/philippe/workspace/MahoutJava/examples/build/apache-mahout-example
s-0.1-dev.job
>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>> 08/10/29 18:28:45 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:28:46 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 1
>>>> 08/10/29 18:28:47 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0002
>>>> 08/10/29 18:28:48 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:28:54 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:28:55 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0002
>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Counters: 7
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes read=291644
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes
written=323660
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input bytes=288374
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:28:56 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:28:56 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0003
>>>> 08/10/29 18:28:57 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:29:03 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:29:05 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:29:10 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:29:11 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0003
>>>> 08/10/29 18:29:11 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes read=323660
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes
written=9657
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes read=36119
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes
written=72300
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input groups=1
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine output
records=28
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce output
records=7
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output
bytes=943020
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine input
records=1732
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output
records=1732
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input
records=28
>>>> 08/10/29 18:29:11 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:29:11 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:29:12 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0004
>>>> 08/10/29 18:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:29:20 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:29:22 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:29:27 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0004
>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes read=342974
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes
written=3002539
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes
read=3018455
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes
written=6036972
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine output
records=0
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce output
records=1591
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output
bytes=3008903
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine input
records=0
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output
records=1591
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input
records=1591
>>>> 08/10/29 18:29:28 INFO kmeans.KMeansDriver: Iteration 0
>>>> 08/10/29 18:29:28 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:29:28 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0005
>>>> 08/10/29 18:29:29 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:29:35 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:29:37 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:29:41 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0005
>>>> 08/10/29 18:29:41 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes read=342974
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes
written=8205
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes read=23227
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes
written=46516
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine output
records=10
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce output
records=7
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output
bytes=1136504
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine input
records=600
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input
records=10
>>>> 08/10/29 18:29:41 INFO kmeans.KMeansDriver: Iteration 1
>>>> 08/10/29 18:29:41 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:29:41 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:29:42 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0006
>>>> 08/10/29 18:29:43 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:29:50 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:29:51 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:29:55 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0006
>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes read=340070
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes
written=8242
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes read=21265
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes
written=42592
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine output
records=10
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce output
records=7
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output
bytes=1023966
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine input
records=600
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input
records=10
>>>> 08/10/29 18:29:56 INFO kmeans.KMeansDriver: Iteration 2
>>>> 08/10/29 18:29:56 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:29:56 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0007
>>>> 08/10/29 18:29:57 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:03 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:05 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0007
>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes read=340144
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes
written=8280
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes read=21085
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes
written=42232
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine output
records=10
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce output
records=7
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output
bytes=1023681
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine input
records=600
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input
records=10
>>>> 08/10/29 18:30:09 INFO kmeans.KMeansDriver: Iteration 3
>>>> 08/10/29 18:30:09 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:30:09 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0008
>>>> 08/10/29 18:30:10 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:17 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:18 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:30:22 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:30:23 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0008
>>>> 08/10/29 18:30:23 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes read=340220
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes
written=8250
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes read=21339
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes
written=42740
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine output
records=10
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce output
records=7
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output
bytes=1028419
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine input
records=600
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input
records=10
>>>> 08/10/29 18:30:23 INFO kmeans.KMeansDriver: Iteration 4
>>>> 08/10/29 18:30:23 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:30:23 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:30:24 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0009
>>>> 08/10/29 18:30:25 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:31 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:33 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:30:37 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0009
>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes read=340160
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes
written=8200
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes read=21219
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes
written=42500
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine output
records=10
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce output
records=7
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output
bytes=1024899
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine input
records=600
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input
records=10
>>>> 08/10/29 18:30:38 INFO kmeans.KMeansDriver: Clustering
>>>> 08/10/29 18:30:38 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:30:38 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0010
>>>> 08/10/29 18:30:39 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:45 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:47 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0010
>>>> 08/10/29 18:30:47 INFO mapred.JobClient: Counters: 7
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes read=340060
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes
written=1020535
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:30:47 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:30:47 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:30:48 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0011
>>>> 08/10/29 18:30:49 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:56 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:57 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0011
>>>> 08/10/29 18:30:57 INFO mapred.JobClient: Counters: 7
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes
read=1020535
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes
written=325460
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input
bytes=1020535
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map output records=600
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Oct 29, 2008 at 11:10 AM, Philippe Lamarche <
>>>> philippe.lamarche@gmail.com> wrote:
>>>>
>>>>  I will!
>>>>         
>>>>> On 10/29/08, Grant Ingersoll <gsingers@apache.org> wrote:
>>>>>
>>>>>           
>>>>>> Philippe, can you try the patch suggested by Arun Murthy on
>>>>>> core-user@hadoop.a.o?  See
>>>>>> http://issues.apache.org/jira/browse/HADOOP-4277
>>>>>>
>>>>>> I'm pretty swamped at the moment w/ ApacheCon coming up next
week, but
>>>>>> if
>>>>>> it does fix the issue, then maybe we should move forward to the
18.2
>>>>>> candidate (I don't think it has been released yet, those guys
have a
>>>>>> pretty
>>>>>> sophisticated build process going)
>>>>>>
>>>>>> -Grant
>>>>>>
>>>>>> On Oct 28, 2008, at 7:19 AM, Philippe Lamarche wrote:
>>>>>>
>>>>>> Ubuntu linux 2.6.24 <http://2.6.24.21>, with java-6-sun-1.6.0.07.
>>>>>>
>>>>>>             
>>>>>>> On Tue, Oct 28, 2008 at 7:03 AM, Grant Ingersoll
<gsingers@apache.org
>>>>>>>
>>>>>>>               
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>                 
>>>>>>> Just a single machine.  I didn't think we were using features
either.
>>>>>>>
>>>>>>>               
>>>>>>>> Are
>>>>>>>> you saying you can run the example using 0.18.1?
>>>>>>>>
>>>>>>>> BTW, Philippe, what JVM, O/S, etc. are you using?
>>>>>>>>
>>>>>>>> -Grant
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 27, 2008, at 11:55 PM, Jeff Eastman wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Are you guys running on real Hadoop arrays? I can run the
synthetic
>>>>>>>>> control example just fine on a single machine. That code is
just
>>>>>>>>> trying
>>>>>>>>> to
>>>>>>>>> read a vector from a string. I'd be surprised if we were using
any
>>>>>>>>> "features" but will watch the threads.
>>>>>>>>>
>>>>>>>>> Jeff
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Grant Ingersoll wrote:
>>>>>>>>>
>>>>>>>>> I started a thread on core-user@hadoop.a.o:
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> http://hadoop.markmail.org/message/cczunzfhpcqz6pis
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 27, 2008, at 9:49 PM, Grant Ingersoll wrote:
>>>>>>>>>>
>>>>>>>>>> OK, I can confirm that the exact same code works with 0.17.2
and not
>>>>>>>>>> w/
>>>>>>>>>>
>>>>>>>>>>  0.18.1.  So, it sounds like a bug in Hadoop, or we are
relying on
>>>>>>>>>>                     
>>>>>>>>>>> incorrect behavior in Hadoop.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 27, 2008, at 9:33 PM, Grant Ingersoll wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote:
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>>>> Unfortunately, I went straight from 0.17.2 to 0.18.1.  It
was
>>>>>>>>>>>> working
>>>>>>>>>>>>
>>>>>>>>>>>>  on
>>>>>>>>>>>>                         
>>>>>>>>>>>>> 0.17.2.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> BTW, are you saying the same exact code was working on
0.17.2 or
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>>>> are
>>>>>>>>>>>> you referring to some older Mahout code that worked on
17.2?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <
>>>>>>>>>>>>                         
>>>>>>>>>>>>> gsingers@apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>> Did this work with 0.18.0 or other prior versions for you?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>> On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  I just updated to hadoop 0.18.1 and got a clean version
of
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>>> mahout
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> svn.
>>>>>>>>>>>>>>> However, I am having problems with KMeans, that can be
traced
>>>>>>>>>>>>>>> down
>>>>>>>>>>>>>>> to :
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO
org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>>> Merging
>>>>>>>>>>>>>>> 2 sorted segments
>>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO
org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>>> Down
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> the last merge-pass, with 2 segments left of total size:
5011
>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 WARN
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0 Merge of the
inmemory
>>>>>>>>>>>>>>> files
>>>>>>>>>>>>>>> threw
>>>>>>>>>>>>>>> an exception: java.io.IOException: Intermedate merge
failed
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doIn
MemMerge(ReduceTask.java:2147)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(
ReduceTask.java:2078)
>>>>>>>>>>>>>>> Caused by: java.lang.NumberFormatException: For input
string:
>>>>>>>>>>>>>>> "["
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
>>>>>>>>>>>>>>> at java.lang.Double.parseDouble(Double.java:510)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java
:256)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner
.java:38)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner
.java:31)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.combineAndSpill(ReduceT
ask.java:2174)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$3100(ReduceTask.
java:341)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doIn
MemMerge(ReduceTask.java:2134)
>>>>>>>>>>>>>>> ... 1 more
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 INFO
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>>> In-memory merge complete: 0 files left.
>>>>>>>>>>>>>>> 2008-10-25 19:10:17,000 WARN
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>>>>>>> Error running child
>>>>>>>>>>>>>>> java.io.IOException:
attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>> copier failed
>>>>>>>>>>>>>>> at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is while running the synthetic_control.data
example, but I
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same problems with any other input data.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am able to do other map-reduce job without problems.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is the output of the jar task:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> hadoop@philippe-vaio:/usr/local/hadoop$ bin/hadoop jar
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
/home/philippe/workspace/MahoutJava/examples/dist/apache-mahout-examples
-0.1-dev.jar
>>>>>>>>>>>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>>>>>>>>>>>>> 08/10/25 19:09:27 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
Tool for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>>> 08/10/25 19:09:29 INFO mapred.JobClient:  map 0% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:09:31 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> read=291644
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> written=323660
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Launched
map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Data-local
map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Map-Reduce
Framework
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> bytes=288374
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:09:32 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
Tool for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>>> 08/10/25 19:09:33 INFO mapred.JobClient:  map 0% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:09:37 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:09:39 INFO mapred.JobClient:  map 100%
reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:09:44 INFO mapred.JobClient:  map 100%
reduce 16%
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> read=323660
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> written=1447
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>>> read=1389
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>>> written=37878
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched
reduce
>>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched
map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Data-local
map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Map-Reduce
Framework
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce
input
>>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine
output
>>>>>>>>>>>>>>> records=29
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce
output
>>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> bytes=943020
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine
input
>>>>>>>>>>>>>>> records=1760
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> records=1732
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce
input
>>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>>> 08/10/25 19:09:53 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
Tool for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>>> 08/10/25 19:09:54 INFO mapred.JobClient:  map 0% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:09:56 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:09:58 INFO mapred.JobClient:  map 100%
reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> read=326554
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> written=1137260
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>>> read=1147358
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>>> written=2304490
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched
reduce
>>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched
map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Data-local
map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Map-Reduce
Framework
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce
input
>>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine
output
>>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce
output
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> bytes=1139660
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine
input
>>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce
input
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0
>>>>>>>>>>>>>>> 08/10/25 19:10:02 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
Tool for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:10:03 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>> job_200810251826_0013
>>>>>>>>>>>>>>> 08/10/25 19:10:04 INFO mapred.JobClient:  map 0% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:10:08 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:10:09 INFO mapred.JobClient:  map 100%
reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:10:21 INFO mapred.JobClient: Task Id :
>>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0, Status : FAILED
>>>>>>>>>>>>>>> java.io.IOException:
attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>> copier
>>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>> at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am not sure if I am doing something wrong here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Philippe.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US
New
>>>>>>>>>>>>>> Orleans.
>>>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>>>                           
>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
>>>>>>>>>>>> Orleans.
>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
Orleans.
>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --------------------------
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> Grant Ingersoll
>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
Orleans.
>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  --------------------------
>>>>>>>>>>                     
>>>>>>>> Grant Ingersoll
>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
Orleans.
>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>
>>>>>>>>
>>>>>>>> Lucene Helpful Hints:
>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --------------------------
>>>>>>>>                 
>>>>>> Grant Ingersoll
>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
Orleans.
>>>>>> http://www.lucenebootcamp.com
>>>>>>
>>>>>>
>>>>>> Lucene Helpful Hints:
>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>> --------------------------
>>> Grant Ingersoll
>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>> http://www.lucenebootcamp.com
>>>
>>>
>>> Lucene Helpful Hints:
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>       
>
>
>   


Mime
View raw message