hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3150) Move task file promotion into the task
Date Tue, 03 Jun 2008 20:56:45 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Devaraj Das updated HADOOP-3150:
--------------------------------

    Attachment: 3150.patch

Attached is a patch. This implements the following:

1) Makes the OutputFormat an abstract class with empty implementations for most methods:

{code}
public abstract class OutputFormat<K, V> {

  abstract public RecordWriter<K, V> getRecordWriter(JobConf job,
                                     String name, Progressable progress)
  throws IOException;

  abstract public void checkOutputSpecs(JobConf job) throws IOException;

  void commitJobOutput(JobConf job) throws IOException {  }
  
  void discardJobOutput(JobConf job) throws IOException {  }
  
  public boolean isTaskCommitRequired(JobConf job, String attemptId) {
    return false;
  }

  void setTaskWorkOutput(JobConf job, String attemptId)
  throws IOException {  }

  void createTaskWorkOutput(JobConf job, String attemptId)
  throws IOException {  }

  void createJobWorkOutput(JobConf job) throws IOException {  }
  
  void commitTaskOutput(JobConf job, String attemptId) 
  throws IOException {  }

  void discardTaskOutput(JobConf job, String attemptId) 
  throws IOException {  }
}

{code}

2) Removes the FileOutputFormat dependencies from the Task and other framework classes. Instead
defines some additional methods in the OutputFormat (though they have FileOutputFormat flavor
but should be okay since the default implementation is empty. This is open for suggestions.).

3) Moves things like saveTaskOutput from Task.java to the FileOutputFormat since that used
to handle just FileOutputFormat anyway.

4) Adds a blocking RPC call canCommit. This call blocks at the tasktracker's end until the
tasktracker hears from the JobTracker what this task should do - commit/discard the output.
The debatable thing here is that we are blocking RPC handlers when a task reaches commit-pending
state. So the expectation is that we'd hear back from the JobTracker pretty soon and anyway
the tasktracker can't do much (like launching new tasks) before it hears from the JobTracker.
Also the number of RPC handlers have been increased in the patch. There are ways to get around
without blocking the RPC handler but this seemed like a simple approach and should not be
a big deal since we are dealing with very (node) local RPCs.

5) A whole lot of changes to do getRecordWriter have been made in the patch to do with removal
of the "ignored" parameter to the method.

6) The taskcommit queue code has been removed from the JT. 

This patch requires testing and may have some bugs at this point. But, I am hoping that it
makes to 0.18. So could someone please take a quick look at the approach. Thanks!

> Move task file promotion into the task
> --------------------------------------
>
>                 Key: HADOOP-3150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3150
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.18.0
>
>         Attachments: 3150.patch
>
>
> We need to move the task file promotion from the JobTracker to the Task and move it down
into the output format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message