hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Scheidtmann <jens.scheidtm...@gmail.com>
Subject Re: question about scheduler rack awareness
Date Tue, 19 Mar 2013 08:07:40 GMT
Dear Jin,

you wrote:
> my question is : will the map task created on the node which access his
10 blocks most fastest ?

hadoop tries hard to run the map tasks on the node, where the data is
stored. "Hadoop: The Definitive Guide" has some UML Sequence diagrams on
what happens for creation of map jvms. Sorry, I was not able to relocate
them on the web, yet (well, safaribooksonline.com ;-).

Depending on the specific data layout (e.g. record lengths), the map tasks
may need to read other blocks anyway, which may be off-node.

On how many nodes is your 100 blocks file stored? on 10?

If it is on one node, then you're likely running into map slot limits or
container limits.

Best regards,


View raw message