hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Job Controller for MapReduce task assignment
Date Sat, 08 Sep 2012 03:23:16 GMT
Hey John,

Here's how MR works, to speak simply:

- Job.submit() is called.
- Job's InputFormat#getSplits() is called, its result serialized and
shipped across, along with other job artifacts such as jars, etc., to
the configured FS, for the JobTracker or the MR2 ApplicationMaster for
- The splits info contains locality hints that the scheduler then uses
to assign a host's slot or resources to, depending also on
availability/requested resources (hence, a 'hint', not strict).

The first two are client-end (controllable), the last is dependent on
the scheduler you've put in use (Fifo/Capacity/Fair) or have
implemented (Custom).

I'm unclear on what exactly you ask, but I think you may want to start
by reading the JobSubmitter class and go around from there.

Does this help?

On Fri, Sep 7, 2012 at 1:24 PM, John Cuffney <cuffneyj@gmail.com> wrote:
> Hey,
> Which class handles the top level partitioning for MapReduce?  It's possible
> I have a misunderstanding of how this is handled, but in my view, there is a
> top level controller which kicks off the whole process; it handles
> partitioning of the input and distribution of the input segments to the
> various machines/tasks.  I have been searching through a lot of the Job
> classes, and they all seem to handle a single task, whereas it is important
> for me to perform some work at the highest level controller, if that exists.
> Any info on what I'm looking for/if I'm on the wrong track would be much
> appreciated.
> Thanks for the help,
> John

Harsh J

View raw message