hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Which replica?
Date Tue, 02 Dec 2008 00:12:21 GMT
A task may read from more than one block.  For example, in line-oriented 
input, lines frequently cross block boundaries.  And a block may be read 
from more than one host.  For example, if a datanode dies midway through 
providing a block, the client will switch to using a different datanode. 
  So the mapping is not simple.  This information is also not, as you 
inferred, available to applications.  Why do you need this?  Do you have 
a compelling reason?


James Cipar wrote:
> Is there any way to determine which replica of each chunk is read by a 
> map-reduce program?  I've been looking through the hadoop code, and it 
> seems like it tries to hide those kinds of details from the higher level 
> API.  Ideally, I'd like the host the task was running on, the file name 
> and chunk number, and the host the chunk was read from.

View raw message