hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rishi Yadav <ri...@infoobjects.com>
Subject Re: How MapReduce selects data blocks for processing user request
Date Sat, 09 Feb 2013 05:00:16 GMT
Hi Mehal,

When Client makes a read request for a certain file say foo.txt, namenode
sends information of first block(BlockID) and the datanodes it resides on.

It's client which decides which datanode to pull information from. If first
request fails, it can make a retry to get another replica of block from
another datanode. This process repeats until all data is read.

Thanks and Regards,

Rishi Yadav

(o) 408.988.2000x113 ||  (f) 408.716.2726

InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*

*INC 500 Fastest growing company in 2012 || 2011*

*Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
Valley / San Jose Business Journal

2041 Mission College Boulevard, #280 || Santa Clara, CA 95054

On Fri, Feb 8, 2013 at 4:40 PM, Mehal Patel <mehal01988@gmail.com> wrote:

> Hello All,
> I am confused over how MapReduce tasks select data blocks for processing
> user requests ?
> As data block replication replicates single data block over multiple
> datanodes, during job processing how uniquely
> data blocks are selected for processing user requests ? How does it
> guarantees that no same block gets chosen twice or thrice
> for different mapper task.
> Thank you
> -Mehal

View raw message