hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject Re: data locality on HDFS
Date Fri, 07 May 2010 11:30:57 GMT
The (o.a.h.fs) FileSystem API has GetBlockLocations that is used to determine replicas.
In general cases, (o.a.h.mapreduce.lib.input) FileInputFormat's getSplits() calls this method,
which is passed on for job scheduling along with the split info.

Hope this is what you were looking for.


On 5/7/10 4:22 PM, "momina khan" <momina.azam@gmail.com> wrote:


i am trying to figure out how hadoop uses data locality to schedule maps on
nodes which locally store tha map input ... going through code i am going in
circles in between a couple of file but not really getting anywhere ... that
is to say that i cant locate the HDFS API or func that can communicate a
node list that store replicas foe say a block!

i am going from FSNameSystem.java to DFSClient.java to
BlocksWithLocations.java to DataNodeDescriptor.java and then back again
without getting to the HDFS interface that communicates replicas' storing
nodes for a block!

someone plz help!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message