hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew John <tmatthewjohn1...@gmail.com>
Subject some doubts Hadoop MR
Date Thu, 10 Feb 2011 11:46:51 GMT
Hi all,

I had some doubts regarding the functioning of Hadoop MapReduce :

1) I understand that every MapReduce job is parameterized using an XML file
(with all the job configurations). So whenever I set certain parameters
using my MR code (say I set splitsize to be 320000kb) it does get reflected
in the job (number of mappers). How exactly does that happen ? Does the
parameters coded in the MR module override the default parameters set in the
configuration XML ? And how does the JobTracker ensure that the
configuration is followed by all the TaskTrackers ? What is the mechanism
followed ?

2) Assume I am running cascading (chained) MR modules. In this case I feel
there is a huge overhead when output of MR1 is written back to HDFS and then
read from there as input of MR2.Can this be avoided ? (maybe store it in
some memory without hitting the HDFS and NameNode ) Please let me know if
there s some means of exercising this because it will increase the
efficiency of chained MR to a great extent.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message