hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <ma...@eecs.berkeley.edu>
Subject Re: Mechanism of MapReduce in Hadoop
Date Thu, 17 Feb 2011 07:50:05 GMT
Do you mean profiling the data path in MapReduce? I think the general consensus is that a decent
amount of time is spent in deserialization and in data copies in the HDFS stack, although
of course there is work to improve this. For example, take a look at https://issues.apache.org/jira/browse/HDFS-347
for optimizations for the HDFS read path (at least for local data). My guess is that these
are two of the more surprising things you'll see in a profile. Of course, for many jobs, the
tasks might be IO-bound, so this may not matter.


On Feb 16, 2011, at 11:40 PM, Matthew John wrote:

> Hi all,
> I want to know if anyone had already done an in-depth analysis of the
> MapReduce mechanism. Has anyone really gone into bytecode level
> understanding of the Map and Reduce mechanism. It would be good if we
> can take a simple MapReduce (say WordCount) and then try the analysis.
> Please send me pointers to if there s already some work done in this
> respect. Or please help me with how to proceed with the same analysis
> if you feel a specific technique/software/development environment has
> ready plugins to help in this regard.
> thanks,
> Matthew John

View raw message