hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David_ca <davidsupe...@gmail.com>
Subject help with two column sort
Date Fri, 17 Jul 2009 03:18:53 GMT
Hi,

I am very new with hadoop and I'm hoping someone can help me do a two column
sort.
For my input, I  have lines with 3 columns. I would like to sort the first
column by string ascending
and the second column by integer descending.
The listing below shows an example input and expected output.

The approach I have taken is to use the JobConf.
setKeyFieldComparatorOptions.
>From reading various resources, putting this setting:
conf.setKeyFieldComparatorOptions("-k1 -k2nr")
conf.set("map.output.key.field.separator", " ");

should do what I want, sort the first column by string, and the second
column
by number descending. I use a space character to separte the 2 key pieces.

But it doesn't seem to work. The actual output I get is also shown below.
Any ideas on what I am doing wrong? The first column seems to be sorted
correctly
but  some of the second columns values are not correct.
For example, these two rows should be reverse.
carrot<adog 1     value_c1
carrot<adog 3     value_c3

Any help is greatly appreciated.

David



/*sample input*/
apple<adog 3     value_a3
apple<adog 1     value_a1
apple<acat 2     value_a2
apple<abird 12     value_a2
carrot<adog 1     value_c1
carrot<adog 3     value_c3
carrot<abird 2     value_c2
banana<acat 1     value_b1
banana<abird 3     value_b3
banana<adog 2     value_b2
banana<adog 11     value_b11
banana<abird 17     value_b17
banana<acat 4     value_b4

/*expected output*/
apple<abird 12     value_a2
apple<acat 2     value_a2
apple<adog 3     value_a3
apple<adog 1     value_a1
banana<abird 17     value_b17
banana<abird 3     value_b3
banana<acat 4     value_b4
banana<acat 1     value_b1
banana<adog 11     value_b11
banana<adog 2     value_b2
carrot<abird 2     value_c2
carrot<adog 3     value_c3
carrot<adog 1     value_c1

/*actual output*/
apple<abird 12     value_a2
apple<acat 2     value_a2
apple<adog 1     value_a1
apple<adog 3     value_a3
banana<abird 17 value_b17
banana<abird 3     value_b3
banana<acat 1     value_b1
banana<acat 4     value_b4
banana<adog 11     value_b11
banana<adog 2     value_b2
carrot<abird 2     value_c2
carrot<adog 1     value_c1
carrot<adog 3     value_c3

Mime
View raw message