hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: About MapTask.java
Date Thu, 24 Feb 2011 14:10:16 GMT
Hey,

On Thu, Feb 24, 2011 at 6:26 PM, Dongwon Kim <eastcirclek@postech.ac.kr> wrote:
> I've been trying to read "MapTask.java" after reading some references such
> as "Hadoop definitive guide" and
> "http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html", but
> it's quite tough to directly read the code without detailed comments.

Perhaps you can add some after getting things cleared ;-)

> Q2)
>
> Is it efficient to partition data first and then sort records inside each
> partition?
>
> Does it happen to avoid comparing expensive pair-wise key comparisons?

Typically you would only want sorting done inside a partitioned set,
since all of the different partitions are sent off to different
reducers. Total-order partitioning may be an exception here, perhaps.

> Q3)
>
> Are there any documents containing explanations about how such internal
> classes are implemented?

There's a very good presentation you may want to see, on the
spill/shuffle/sort framework portions your doubts are about:
http://www.slideshare.net/hadoopusergroup/ordered-record-collection

HTH :)

-- 
Harsh J
www.harshj.com

Mime
View raw message