hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From praveenesh kumar <praveen...@gmail.com>
Subject How Jobtracker choose DataNodes to run TaskTracker ?
Date Fri, 16 Dec 2011 07:42:46 GMT
Okay so I have one question in mind.

Suppose I have a replication factor of 3 on my cluster of some N
nodes, where N>3 and  there is a data block B1 that exists on some 3
Data nodes --> DD1, DD2, DD3.

I want to run some Mapper function on this block.. My JT will
communicate with NN, to know where can he find the block.
My assumption is NN will give JT all the Data node information where
the block resides, in this case - DD1, DD2,DD3. Am I right on this ?

Now my question is how JT will come to know on which DD it should send
its mapper code ?

Suppose it chose DD1, and my tasktracker starts running on that
machine. By some reasons, DD1 is taking more time than it should have
taken time when it would be running on DD2. How hadoop understand and
take these decisions ?


View raw message