hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: Hadoop-Yarn-MR reading InputSplits and processing them by the RecordReader, architecture/design question.
Date Fri, 01 Feb 2013 20:58:11 GMT
You got that mostly right. And it doesn't differ much in Hadoop 1.* either.
With MR AM doing the work that was earlier done in JobTracker., the
JobClient and the task side doesn't change much.

FileInputFormat.getsplits() is called by client itself, so you should look
for logs on the client machine.

Each filesystem overrides getFileBlockLocations() and provides the correct
locations - like DFS internally uses the getBlockLocations() API on
Namenode. What you are seeing is the default implementation for local FS.


On Fri, Feb 1, 2013 at 6:24 AM, blah blah <tmp5330@gmail.com> wrote:

> Hi
> (I am using Yarn Hadoop-3.0.0.SNAPSHOT, revision 1437315M)
> I have a question regarding my assumptions on the Yarn-MR design,
> specially the InputSplit processing. Can someone confirm or point out my
> mistakes in my MR-Yarn design assumptions?
> These are my assumptions regarding design.
> 1. JobClient submits Job
> Create AppMaster etc.
> 2. Get number of splits // MR-AM, specially their hosts, so that a Task
> can be started on the same node, use *InputFormat.getSplts() { ...;
> FileSystem.getFileBlockLocations(); ...;}
> 3. Start N tasks // MR-AM
> 4. Each Task processes its (single) split (unless splitsNr >> tasksNr)
> with the use of InputFormat/RecordReader // MR-Task, from HERE InputFormat
> operates only on a single Split
> 5. Start RecordReader and process Split // MR-Task
> 5. MAP() // MR-Task
> 6. Do rest MR // MR-Task
> 7. Dump to HDFS/or other storage. // MR-Task
> 8. Report FINISH, free resources // MR-AM
> 2 quick bonus questions
> I have added additional log entry in the FileInputFormat.getSplits(),
> however I can not see it in log files. I am using WordCount example and
> INFO level. What might be the problem?
> In the FileSystem.getFileBlockLocations() the hostname is hard-coded as
> "localhost", where this is mapped to the actual host name, so that AM will
> know which nodes to request?
> Thanks for reply

Hortonworks Inc.

View raw message