mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhruv Kumar <dku...@ecs.umass.edu>
Subject Re: Why does KMeansDriver set the map output key and value types explicitly?
Date Tue, 24 May 2011 00:46:55 GMT
On Mon, May 23, 2011 at 6:24 PM, Jeff Eastman <jeastman@narus.com> wrote:

> That's the way it has always been done. Kmeans was one of the first Mahout
> algorithms and that driver code has been around for maybe 3 years. Is there
> a better way?
>

Sorry, I wasn't questioning the rigor or design of the code, but rather
seeking help regarding my confusion about Hadoop's API.

I was wondering why the framework has to be told explicitly what the output
types of a mapper are because I thought the call to context.write should
pick them up as long as they implement Writable, which they do in the
K-Means case.

After searching a little bit, I found the answer on Yahoo (
http://developer.yahoo.com/hadoop/tutorial/module5.html)

"If your Mapper emits different types than the Reducer, you can set the
types emitted by the mapper with the JobConf's setMapOutputKeyClass() and
setMapOutputValueClass() methods."

Thank you for your response, my driver code has moved forward now!



> -----Original Message-----
> From: dhruv21@gmail.com [mailto:dhruv21@gmail.com] On Behalf Of Dhruv
> Kumar
> Sent: Monday, May 23, 2011 1:39 PM
> To: dev@mahout.apache.org
> Subject: Why does KMeansDriver set the map output key and value types
> explicitly?
>
> To get ideas for my BaumWelch Driver class for Mahout-627, I have been
> studying the K-Means implementation carefully.
>
> In KMeansDriver.java, the function runIteration is responsible for
> dispatching a single MapReduce job. It contains the following constructs
> for
> setting the output key and value types from the mapper.
>
> job.setMapOutputKeyClass(Text.class);
> job.setMapOutputValueClass(ClusterObservations.class);
>
> However in the core mapping operation performed by the
> emitPointToNearestCluster function present in the KMeansClusterer.java, I
> find that the output key is of the type Text, and that the output values
> are
> of the type ClusterObservations which implements Writable:
>
> context.write(new Text(nearestCluster.getIdentifier()), new
> ClusterObservations(1, point, point.times(point)));
>
> Why is the KMeansDriver setting the mapper's output keys and values types
> explicitly?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message