hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amar Kamat <ama...@yahoo-inc.com>
Subject Re: Hadoop Internal Architecture writeup
Date Fri, 28 Nov 2008 06:59:43 GMT
Ricky Ho wrote:
> I put together an article describing the internal architecture of Hadoop (HDFS, MapRed).
 I'd love to get some feedback if you see anything inaccurate or missing ...
> http://horicky.blogspot.com/2008/11/hadoop-mapreduce-implementation.html
Few comments on MR :
1) "The JobTracker will first determine the number of splits (each split 
is configurable, ~16-64MB) from the input path, and select some 
TaskTracker based on their network proximity to the data sources, then 
the JobTracker send the task requests to those selected TaskTrackers."
The jobclient while submitting the job calculates the split using 
InputFormat which is specified by the user. Internally the InputFormat 
might make use of dfs-block size, user-hinted num-maps etc. The 
jobtracker is given 3 files
- job.xml : job control parameters
- job.split : the split file
- job.jar : user map-reduce code
Hence the JobTracker never actually *calculates* the split.
Also the task-tracker asks for the task. Its a pull method where the 
tracker ask and the jobtracker schedules based on (node/rack/switch) 
locality. JobTracker never actually initiates scheduling.

2) For each record parsed by the “InputFormat”
To be more precise "RecordReader" included in the "InputFormat".

3) A periodic wakeup process will sort the memory buffer into different 
reducer node by invoke the “combine” function.
Need to cross check if its a wakeup process or a on-demand thread that 
is spawned once the buffer is nearly full. Btw the function that 
determined which key-val goes to which reducer is called "Partitioner". 
Combiner is just an optimization that does a local 
merge/reduction/aggregation of the data before sending it over the network.

4) When all the TaskTrackers are done, the JobTracker will notify the 
selected TaskTrackers for the reduce phase.
This process is interleaved/parallelized. As soon as a map is done, the 
JobTracker is notified. Once a tracker (with a reducer) asks for events, 
these new events are passed. Hence the map output pulling (Shuffle 
Phase) works in parallel with the Map Phase. Reduce Phase can start only 
once all the (resp) map outputs are copied and merged.

5) The JobTracker keep tracks of the progress of each phases and 
periodically ping the TaskTracker for their health status.
Its again a push rather than a pull. Trackers report their status 
instead of JobTracker asking them.

6) When any of the map phase TaskTracker crashes, the JobTracker will 
reassign the map task to a different TaskTracker node, which will rerun 
all the assigned splits.
There is a 1-1 mapping between a split and a map task. Hence it will 
re-run the map on the corresponding split.

7) After both phase completes, the JobTracker will unblock the client 
The client is unblocked once the job is submitted. The way it works is 
as follows :
- jobclient requests the jobtracker for a unique job id
- jobclient does some sanity checks to see if the output folder exists 
etc ...
- jobclient uploads job files (xml, jar, split) onto a known location 
called System-Directory
- jobclient informs the jobtracker that the files are ready and the 
jobtracker returns the control.

Its not necessary that the jobclient always gets blocked on job submission.

> Rgds,
> Ricky

View raw message