hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: doubts
Date Wed, 13 Oct 2010 13:09:23 GMT
1) It fetches the block from the rack it is on, if available or from another
rack if not.  Block is fetched (or streamed?) over the network I believe,
before map can begin.  This feature is known as the rack locality.  You can
see a counter associated with this in the jobs you run (data local tasks,
rack local tasks, etc).

2) The reducer has a phase called copy which fetches _all_ the map outputs
it needs to act on (first 33%).  Only then the sort phase is initiated (next
33%).  Only after copy and sort, the reduce begins (onto 100%).  So such an
issue won't occur, as all map outputs are fetched before any other logic

On Oct 13, 2010 5:42 PM, "Matthew John" <tmatthewjohn1988@gmail.com> wrote:

Hi all ,

Had some doubts :

1) what happens when a mapper running in node A needs data from a block it
does nt have ? ( the block might be present in some other node in the
cluster )

2) in the Sort/Shuffle phase is just a logical representation of all map
outputs together sorted rite ? and again, what happens when reduce in Node C
needs access of some map outputs not in its memory?

Matthew .

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message