hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thejas Nair <te...@yahoo-inc.com>
Subject Re: Sorting
Date Sat, 06 Mar 2010 01:06:29 GMT
If you don't want to implement all that, then just use 3 lines of pig.
 l = load 'file';
 o = order file by $1;
 store o into 'file.sorted'

-Thejas



On 3/4/10 2:17 PM, "Alex Kozlov" <alexvk@cloudera.com> wrote:

> Hi Aayush,
> 
> In short, you write a special partitioner that partitions the data in
> non-overlapping intervals.
> 
> There a few article on this with a lot more details:
> 
> http://sortbenchmark.org/YahooHadoop.pdf
> http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162
> .html
> 
> Alex K
> 
> On Wed, Mar 3, 2010 at 9:21 AM, Aayush Garg <aayush.garg@gmail.com> wrote:
> 
>> Hi,
>> 
>> Suppose I do need to sort a big file(in GB). How would I accomplish
>> this task using hadoop.
>> My main problem is how to merge the output of individual reduce phases?
>> 
>> thanks
>> 


Mime
View raw message