hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: help with two column sort
Date Fri, 17 Jul 2009 06:05:58 GMT
that let you log what is going on in the field comparator or field
partitioner.

On Thu, Jul 16, 2009 at 11:05 PM, jason hadoop <jason.hadoop@gmail.com>wrote:

> In the example code for Pro Hadoop there are some shims for the
> fieldcomparator classes, that let you log what is going on in the
> partitioner.
>
> Also it is very useful if cumbersome to step through that in the debugger.
>
>
> On Thu, Jul 16, 2009 at 10:59 PM, David_ca <davidsuperca@gmail.com> wrote:
>
>> Hi,
>>
>> I am very new with hadoop and I'm hoping someone can help me do a two
>> column
>> sort.
>> For my input, I  have lines with 3 columns. I would like to sort the first
>> column by string ascending
>> and the second column by integer descending.
>> The listing below shows an example input and expected output.
>>
>> The approach I have taken is to use the
>> JobConf.setKeyFieldComparatorOptions.
>> From reading various resources, putting this setting:
>> conf.setKeyFieldComparatorOptions("-k1 -k2nr")
>> conf.set("map.output.key.field.separator", " ");
>>
>> should do what I want, sort the first column by string, and the second
>> column
>> by number descending. I use a space character to separte the 2 key pieces.
>>
>> But it doesn't seem to work. The actual output I get is also shown below.
>> Any ideas on what I am doing wrong? The first column seems to be sorted
>> correctly
>> but  some of the second columns values are not correct.
>> For example, these two rows should be reverse.
>> carrot<adog 1     value_c1
>> carrot<adog 3     value_c3
>>
>> Any help is greatly appreciated.
>>
>> David
>>
>>
>>
>> /*sample input*/
>> apple<adog 3     value_a3
>> apple<adog 1     value_a1
>> apple<acat 2     value_a2
>> apple<abird 12     value_a2
>> carrot<adog 1     value_c1
>> carrot<adog 3     value_c3
>> carrot<abird 2     value_c2
>> banana<acat 1     value_b1
>> banana<abird 3     value_b3
>> banana<adog 2     value_b2
>> banana<adog 11     value_b11
>> banana<abird 17     value_b17
>> banana<acat 4     value_b4
>>
>> /*expected output*/
>> apple<abird 12     value_a2
>> apple<acat 2     value_a2
>> apple<adog 3     value_a3
>> apple<adog 1     value_a1
>> banana<abird 17     value_b17
>> banana<abird 3     value_b3
>> banana<acat 4     value_b4
>> banana<acat 1     value_b1
>> banana<adog 11     value_b11
>> banana<adog 2     value_b2
>> carrot<abird 2     value_c2
>> carrot<adog 3     value_c3
>> carrot<adog 1     value_c1
>>
>> /*actual output*/
>> apple<abird 12     value_a2
>> apple<acat 2     value_a2
>> apple<adog 1     value_a1
>> apple<adog 3     value_a3
>> banana<abird 17 value_b17
>> banana<abird 3     value_b3
>> banana<acat 1     value_b1
>> banana<acat 4     value_b4
>> banana<adog 11     value_b11
>> banana<adog 2     value_b2
>> carrot<abird 2     value_c2
>> carrot<adog 1     value_c1
>> carrot<adog 3     value_c3
>>
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message