hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Question about
Date Thu, 13 Sep 2012 16:31:34 GMT
MR does not read the files in the front-end (unless a partitioner such
as the TOP demands it). The actual block-level read is done via the
DFSClient class (its sub-classes DFSInputStream and DFSOutputStream -
the first one should be where your interest lies.)

All MR cares about is scheduling the data locally, so it just takes
the block locations (metadata) to conjure up split objects for the
scheduler and the task and sends it across.

On Thu, Sep 13, 2012 at 5:40 AM, Vivi Lang <sqlxweiwei@gmail.com> wrote:
> Hi all,
>
> Is there anyone who can tell me that when we lanuch a mapreduce task, for
> example, wordcount, after the JobClient obtained the block locations (the
> related hosts/datanodes are stored in the specified split), which
> function/class will be called for reading those blocks from the datanode?
>
> Thanks,
> Vivian



-- 
Harsh J

Mime
View raw message