hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Kozlov <ale...@cloudera.com>
Subject Re: Sorting
Date Thu, 04 Mar 2010 22:17:33 GMT
Hi Aayush,

In short, you write a special partitioner that partitions the data in
non-overlapping intervals.

There a few article on this with a lot more details:

http://sortbenchmark.org/YahooHadoop.pdf
http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html

Alex K

On Wed, Mar 3, 2010 at 9:21 AM, Aayush Garg <aayush.garg@gmail.com> wrote:

> Hi,
>
> Suppose I do need to sort a big file(in GB). How would I accomplish
> this task using hadoop.
> My main problem is how to merge the output of individual reduce phases?
>
> thanks
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message