hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmad Humayun" <ahmad.hu...@gmail.com>
Subject Re: intermediate map data
Date Mon, 03 Mar 2008 16:54:33 GMT
Thanks a lot Amar. As usual, you have cleared a lot of the haze in head :)


regards,

On Mon, Mar 3, 2008 at 9:32 PM, Amar Kamat <amarrk@yahoo-inc.com> wrote:

> On Mon, 3 Mar 2008, Ahmad Humayun wrote:
>
> > Hello everyone,
> >
> > I have a question about the intermediate data output by the map
> function. I
> > wanted to know that does this intermediate data get written to the HDFS
> or
> > it stays in the node's local memory?
> It stored on the local disk.
> > According to the MapReduce paper, the
> > intermediate data is run through a hash function which maps every key to
> a
> > given a reduce worker. So how does this whole process happen? Does the
> map
> > worker write the intermediate data to the HDFS and then tells the
> JobTracker
> > (Master) which Reduce worker should be allotted this data? Or the Map
> worker
> > keeps the intermediate data in memory and makes an RPC call directly to
> the
> > reduce worker (which was figured out by the hash function) to transfer
> the
> > intermediate data?
> >
> The map uses something called the partitioner. Each map writes they <k,v>
> pair for the appropriate reducer determined by this partition function. In
> the end there is a map output file which is nothing but outputs for each
> reduce function concatenated in sequence based on reduce id. The hash
> function you are talking about is the partition function in HADOOP.
> JobTracker is not involved in these things. Since the map has generated
> output for each reducer, whenever a reducer requests for a map output the
> tracker indexes into the mapouput file and sends the appropriate map
> output chunk.
> > It will be great if you can point me to the place, where these
> > functionalities are implemented in hadoop.
> See TaskTracker$MapOutputServlet.
> > Plus it will be great if you can also point me to the place where the
> hash
> > function is in map?
> See o.a.h.m.Partitioner.java
> >
> > thanks again for the great support on this mailing list.
> >
> >
> > regards,
> >
>



-- 
Ahmad Humayun
Research Assistant
Computer Science Dpt., LUMS
+92 321 4457315

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message