hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das" <d...@yahoo-inc.com>
Subject RE: DecreasingComparator
Date Mon, 02 Jul 2007 04:53:20 GMT
Ignore my response..
I read your mail wrongly and assumed that you wanted to sort by the
decreasing order of counts of the url words. 

-----Original Message-----
From: Devaraj Das [mailto:ddas@yahoo-inc.com] 
Sent: Monday, July 02, 2007 10:09 AM
To: hadoop-user@lucene.apache.org
Subject: RE: DecreasingComparator

Your first MapReduce phase is very similar to the WordCount example. The
only difference is that you need to create LongWritable objects for the
values. The output format should be SequenceFileOutputFormat.class.

Run a subsequent MapReduce phase with the input format set to
SequenceFileInputFormat.class, the map class set to InverseMapping.class,
and, the OutputKeyComparator set to LongWritable.DecreasingComparator.class.


By the way, the 2nd mapreduce phase won't work unless you patch your version
of hadoop with
https://issues.apache.org/jira/secure/attachment/12360717/1535_01.patch .
This hasn't been committed yet.

-----Original Message-----
From: Peter W. [mailto:peter@marketingbrokers.com]
Sent: Monday, July 02, 2007 6:08 AM
To: hadoop-user@lucene.apache.org
Subject: DecreasingComparator

Hello,

I have a modified WordCount program with the following characteristics:

input file:
urla.com,urlb.com
urla.com,urlc.com
urlb.com,urlc.com
urlc.com,urla.com
urld.com,urlc.com

mapreduce output:
urla.com 3
urlb.com 2
urlc.com 4
urld.com 1

Next, tried using a comparator with a different JobConf and mapreduce:

jc.setOutputKeyComparatorClass(LongWritable.DecreasingComparator.class);

but it didn't work because the values are IntWritable and my OutputCollector
wasn't picking up the right things...

What do I need to collect in both the map and reduce for the final result to
sort descending high-low?

Thanks,

Peter W.



Mime
View raw message