hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: About MapTask.java
Date Thu, 24 Feb 2011 14:10:16 GMT

On Thu, Feb 24, 2011 at 6:26 PM, Dongwon Kim <eastcirclek@postech.ac.kr> wrote:
> I've been trying to read "MapTask.java" after reading some references such
> as "Hadoop definitive guide" and
> "http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html", but
> it's quite tough to directly read the code without detailed comments.

Perhaps you can add some after getting things cleared ;-)

> Q2)
> Is it efficient to partition data first and then sort records inside each
> partition?
> Does it happen to avoid comparing expensive pair-wise key comparisons?

Typically you would only want sorting done inside a partitioned set,
since all of the different partitions are sent off to different
reducers. Total-order partitioning may be an exception here, perhaps.

> Q3)
> Are there any documents containing explanations about how such internal
> classes are implemented?

There's a very good presentation you may want to see, on the
spill/shuffle/sort framework portions your doubts are about:

HTH :)

Harsh J

View raw message