hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: Shuffle phase
Date Wed, 22 May 2013 14:58:01 GMT
I was reading the elephant book trying to understand which process actually serves up the HTTP
transfer on the mapper side.  Is it the each map task?  Or is there some persistent task on
each worker that serves up mapper output for all map tasks?

From: Kai Voigt [mailto:k@123.org]
Sent: Tuesday, May 21, 2013 12:59 PM
To: user@hadoop.apache.org
Subject: Re: Shuffle phase replication factor

The map output doesn't get written to HDFS. The map task writes its output to its local disk,
the reduce tasks will pull the data through HTTP for further processing.

Am 21.05.2013 um 19:57 schrieb John Lilley <john.lilley@redpoint.net<mailto:john.lilley@redpoint.net>>:

When MapReduce enters "shuffle" to partition the tuples, I am assuming that it writes intermediate
data to HDFS.  What replication factor is used for those temporary files?

Kai Voigt

View raw message