hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mostafa Elhemali <mostafa.elhem...@gmail.com>
Subject Re: Question about intermediate kv pair files
Date Mon, 03 Dec 2012 18:26:00 GMT
(Disclaimer: Not an expert, but looked at that code quite a bit. Hopefully
the list will correct any details I get wrong)

In Hadoop 1: the mapper would put the file in a well-known location on the
machine (encoded by user, job ID and map ID) then TaskTracker would serve
it over HTTP to the reducer when it requests it (authenticated using a
secret token in the job). Look in the MapOutputServlet class in TaskTracker
for most of the related code.

In Yarn: similar thing, except that now it's a NodeManager plug-in
(auxiliary service) that serves the map output since there's no TaskTracker
anymore. Look at the ShuffleHandler class in
hadoop-mapreduce-client-shuffle project. I see comments in the code
indicating that this will be changed from a NodeManager plug-in in the
future, but I don't know much about that.

Hope it helps,

On Mon, Dec 3, 2012 at 10:08 AM, rshepherd <rjs471@nyu.edu> wrote:

> Hi folks,
> Can anyone explain to me briefly how the each mapper reports the
> location of the intermediate kv partion files to the master? And, if
> possible, where in the code I might find where that happens?
> Thanks for any help,
> Randy

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message