hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Bhattacharjee <rahul.rec....@gmail.com>
Subject Re: Shuffle phase replication factor
Date Wed, 22 May 2013 14:51:57 GMT
There are properties/configuration to control the no. of copying threads
for copy.
tasktracker.http.threads=40
Thanks,
Rahul


On Wed, May 22, 2013 at 8:16 PM, John Lilley <john.lilley@redpoint.net>wrote:

>  This brings up another nagging question I’ve had for some time.  Between
> HDFS and shuffle, there seems to be the potential for “every node
> connecting to every other node” via TCP.  Are there explicit mechanisms in
> place to manage or limit simultaneous connections?  Is the protocol simply
> robust enough to allow a server-side to disconnect at any time to free up
> slots and the client-side will retry the request?****
>
> Thanks****
>
> john****
>
> ** **
>
> *From:* Shahab Yunus [mailto:shahab.yunus@gmail.com]
> *Sent:* Wednesday, May 22, 2013 8:38 AM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Shuffle phase replication factor****
>
> ** **
>
> As mentioned by Bertrand, Hadoop, The Definitive Guide, is well... really
> definitive :) place to start. It is pretty thorough for starts and once you
> are gone through it, the code will start making more sense too.****
>
> ** **
>
> Regards,****
>
> Shahab****
>
> ** **
>
> On Wed, May 22, 2013 at 10:33 AM, John Lilley <john.lilley@redpoint.net>
> wrote:****
>
> Oh I see.  Does this mean there is another service and TCP listen port for
> this purpose?****
>
> Thanks for your indulgence… I would really like to read more about this
> without bothering the group but not sure where to start to learn these
> internals other than the code.****
>
> john****
>
>  ****
>
> *From:* Kai Voigt [mailto:k@123.org]
> *Sent:* Tuesday, May 21, 2013 12:59 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Shuffle phase replication factor****
>
>  ****
>
> The map output doesn't get written to HDFS. The map task writes its output
> to its local disk, the reduce tasks will pull the data through HTTP for
> further processing.****
>
>  ****
>
> Am 21.05.2013 um 19:57 schrieb John Lilley <john.lilley@redpoint.net>:****
>
> ** **
>
> When MapReduce enters “shuffle” to partition the tuples, I am assuming
> that it writes intermediate data to HDFS.  What replication factor is used
> for those temporary files?****
>
> john****
>
>  ****
>
>  ****
>
> -- ****
>
> Kai Voigt****
>
> k@123.org****
>
>  ****
>
> ** **
>
>  ****
>
> ** **
>

Mime
View raw message