hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: doubt about reduce tasks and block writes
Date Fri, 24 Aug 2012 22:02:54 GMT
Assuming that node A only contains replica, there is no garante that its
data would never be read.
First, you might lose a replica. The copy inside the node A could be used
to create the missing replica again.
Second, data locality is on best effort. If all the map slots are occupied
except one on one node without a replica of the data then your node A is as
likely as any other to be chosen as a source.



On Fri, Aug 24, 2012 at 10:09 PM, Marc Sturlese <marc.sturlese@gmail.com>wrote:

> Hey there,
> I have a doubt about reduce tasks and block writes. Do a reduce task always
> first write to hdfs in the node where they it is placed? (and then these
> blocks would be replicated to other nodes)
> In case yes, if I have a cluster of 5 nodes, 4 of them run DN and TT and
> one
> (node A) just run DN, when running MR jobs, map tasks would never read from
> node A? This would be because maps have data locality and if the reduce
> tasks write first to the node where they live, one replica of the block
> would always be in a node that has a TT. Node A would just contain blocks
> created from replication by the framework as no reduce task would write
> there directly. Is this correct?
> Thanks in advance
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/doubt-about-reduce-tasks-and-block-writes-tp4003185.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Bertrand Dechoux

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message