hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikhil S. Ketkar (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3315) Master-Worker Application on YARN
Date Wed, 21 Mar 2012 09:43:39 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234229#comment-13234229

Nikhil S. Ketkar commented on MAPREDUCE-3315:

I have started some work on this issue and I wanted to describe the overall approach I am
taking and get some feedback. 

Let me start by describing how an application will be written using Master-Worker. The pseudocode
below illustrates the usage. 

class WordCount {
    // Word count implementation using Master-Worker paradigm.
    // Master sends portions of the input file to workers
    // Worker builds a word count hash map (word -> count) and sends it as a result unit
    // Master uses the results to update master dictionary 

    class WorkUnit {
    // A list of words

    class ResultUnit {
    // Map of word -> count

    class Master extends MasterRunner<WorkUnit, ResultUnit> {
	public manageWorkers() {
            // Spawn a bunch of workers using spawnNewWorker();
            // For each spawned worker spawnNewWorker() will return a WorkerReference, keep
it around for bookkeeping
            // Read a file and create WorkUnits with a fixed number of lines to each workers
            // Assign WorkUnits to Workers using assignWork(), use the previously obtained
            // Wait for ResultUnits using waitForResult();
            // Update master dictionary based on a received result unit
            // After work is done, kill workers using terminateWorker()

   class Worker extends WorkerRunner<WorkUnit, ResultUnit>  {
	public ResultUnit doWork(WorkUnit wu) {
       // Build a word count hash map (word -> count) and sends it as a result unit

// User writes code to setup and launch job
public static void main(String[] args) throws Exception {
    MWJobConf conf = new MWJobConf(WordCount.class);

Key functionality for the Master Worker will be implemented in the following classes.

class MasterRunner<W, R> {
    // Spawn new Worker
    protected WorkerReference spawnNewWorker();

    // Spawn new Workers
    protected ArrayList<WorkerReference> spawnNewWorkers();

    // Assign work to any Worker, returns WorkerReference to whom it was assigned
    protected WorkerReference assignWork(WorkUnit wu);

    // Assign work to a specific Worker
    protected void assignWork(WorkerReference wf, WorkUnit wu);

    // Wait for result from any Worker. Blocking Call
    protected ResultUnit waitForResult();

    // Wait for result from a specific worker. Blocking Call
    protected ResultUnit waitForResult(WorkerReference wf);

    // Is this specific worker alive? Blocking Call
    protected boolean isWorkerAlive(WorkerReference wf);

    // Get alive workers? Blocking Call
    protected ArrayList<WorkerReference> isWorkerAlive(WorkerReference wf);

    // Terminate a specific Worker
    protected void terminateWorker(WorkerReference wf);

    // To be implemented by user
    public manageWorkers() = 0;

class WorkerRunner<W, R> {
    // To be implemented by user
    public ResultUnit doWork(WorkUnit wu) = 0;

class WorkUnitContainer<W> {
  // The framework passes around WorkUnitContainers which contain the user defined 
  // WorkUnit and some additional bookkeeping information

class ResultUnitContainer<R> {
 // The framework passes around ResultUnitContainers which contain the user defined 
 // ResultUnit and some additional bookkeeping information

class WorkerReference {
 // Uniquely identifies the Workers

A few questions I have been thinking about are:
# Should I use Hadoop IPC or RMI or something else?
# Should the Master be "in" the ApplicationManager or be run as a Container?

> Master-Worker Application on YARN
> ---------------------------------
>                 Key: MAPREDUCE-3315
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3315
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>             Fix For: 0.24.0
> Currently master worker scenarios are forced fit into Map-Reduce. Now with YARN, these
can be first class and would benefit real/near realtime workloads and be more effective in
using the cluster resources.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message