hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy KS <bejoy.had...@gmail.com>
Subject Re: How to sort key,value pair by value(In ascending)
Date Wed, 14 Sep 2011 12:38:28 GMT
Shashi
      Here you'd definitely need a set of map reduce process to do the
aggregation of values on the reducer. Now for sorting the output in very
simple terms use another set of map reduce where the map output key would be
the value of the first Map Reduce output and the map output value would be
the the key of the first MapReduce output. One more map reduce process is
certainly expensive.
You can watch out the post as the experts would comment if there are better
solutions to your problem.

Regards
Bejoy.K.S



On Wed, Sep 14, 2011 at 12:04 PM, ksgupta misc <ksgupta.misc@gmail.com>wrote:

> Hi Guys,
>  Thanku for your valuable suggestion.
> I see this works fine in cases were key values are unique.
>
> In my use cases the values are as follows:
> *<bookid>,<eid>,<rating>*
> 0000012742,3244,1
> 0028604164,2344,3
> 0062059017,2344,5
> 0075546701,2344,1
> 0130213268,2344,8
> 0140105425,5675,3
> 0141304286,5677,6
> 0195052668,3453,8
> 0198775024,2342,9
> 0000012742,2346,2
> 0028604164,9789,4
> 0062059017,2346,3
> 0075546701,2345,2
> 0130213268,8907,4
> 0140105425,5675,5
> 0141304286,3457,6
> 0195052668,5678,7
> 0198775024,8975,8
> 0000012742,6798,3
> 0028604164,5434,7
> 0062059017,9754,4
> 0075546701,7890,6
> 0130213268,7655,7
> 0140105425,7564,8
> 0141304286,8433,3
> 0195052668,3252,6
> 0198775024,7765,7
>
> My goal here to right a program which will output the books id's sorted (
> ascending) by the average of rating.
> I am done till the following steps:
> 1. Map : create pairs key, value and context.write(key,value)
> 2. Reducer: For each key    sum of ratings/no of book entries.
> context(key,avg_rating)
>
> Example output will be like:
> 0075546701,4.6v
> 0062059017,2.1
> 0195052668,6.1
> 0198775024,2.7
>
> My next step is to sort the books ids based on (ascending) order of the
> average rating.
> How to write the program for getting the example output as follows:
>
> 0062059017,2.1
> 0198775024,2.7
> 0075546701,4.6
> 0195052668,6.1
>
>
> Please let me know if my approach is wrong  as i am new to hadoop.
>
> Thanks in advance,
> --Shashi.
>
>
>
>
>
> On Wed, Sep 14, 2011 at 11:32 AM, Sudharsan Sampath <sudhan65@gmail.com>wrote:
>
>> One way is to reverse the  <key,value> output in the mapper to emit<1,
>> 10050> and in the reducer, use a treeset to order ur values.. for each value
>> o/p <value, key> in the reducer.
>>
>> With this O/P will be sorted as per ur needs within each reducer. If u
>> need a total sorted o/p, u can use a single reducer or design ur partition
>> logic accordingly.
>>
>> Thanks
>> Sudhan S
>>
>>
>> On Wed, Sep 14, 2011 at 6:14 AM, ksgupta misc <ksgupta.misc@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I have the content like
>>> *10103*,1042279,*4*
>>> *10070*,1001089,*5*
>>> *10102*,1015504,*7*
>>> *10080*,1024369,*7*
>>> *10050*,1025671,*1*
>>> ...
>>> from which i separated the key,value pairs and got the output after a
>>> single map and reduce as follows:
>>>
>>> 10050  1
>>> 10070  5
>>> 10080  7
>>> 10102  7
>>> 10103  4
>>> ...
>>>
>>> I require to sort the output<key,value> pair by value (In ascending
>>> order).
>>> Please let me know how can i go ahead.
>>>
>>> Required output:
>>> 10050  1
>>> 10103  4
>>> 10070  5
>>> 10080  7
>>> 10102  7
>>>
>>> Thanks in advance,
>>> --Shashi
>>>
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message