Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 15080 invoked from network); 3 Jun 2008 20:57:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Jun 2008 20:57:12 -0000 Received: (qmail 8393 invoked by uid 500); 3 Jun 2008 20:57:12 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 8359 invoked by uid 500); 3 Jun 2008 20:57:12 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 8327 invoked by uid 99); 3 Jun 2008 20:57:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jun 2008 13:57:11 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jun 2008 20:56:23 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 524BC234C12F for ; Tue, 3 Jun 2008 13:56:45 -0700 (PDT) Message-ID: <680553782.1212526605336.JavaMail.jira@brutus> Date: Tue, 3 Jun 2008 13:56:45 -0700 (PDT) From: "Devaraj Das (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Updated: (HADOOP-3150) Move task file promotion into the task In-Reply-To: <1005772078.1207083747584.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HADOOP-3150: -------------------------------- Attachment: 3150.patch Attached is a patch. This implements the following: 1) Makes the OutputFormat an abstract class with empty implementations for most methods: {code} public abstract class OutputFormat { abstract public RecordWriter getRecordWriter(JobConf job, String name, Progressable progress) throws IOException; abstract public void checkOutputSpecs(JobConf job) throws IOException; void commitJobOutput(JobConf job) throws IOException { } void discardJobOutput(JobConf job) throws IOException { } public boolean isTaskCommitRequired(JobConf job, String attemptId) { return false; } void setTaskWorkOutput(JobConf job, String attemptId) throws IOException { } void createTaskWorkOutput(JobConf job, String attemptId) throws IOException { } void createJobWorkOutput(JobConf job) throws IOException { } void commitTaskOutput(JobConf job, String attemptId) throws IOException { } void discardTaskOutput(JobConf job, String attemptId) throws IOException { } } {code} 2) Removes the FileOutputFormat dependencies from the Task and other framework classes. Instead defines some additional methods in the OutputFormat (though they have FileOutputFormat flavor but should be okay since the default implementation is empty. This is open for suggestions.). 3) Moves things like saveTaskOutput from Task.java to the FileOutputFormat since that used to handle just FileOutputFormat anyway. 4) Adds a blocking RPC call canCommit. This call blocks at the tasktracker's end until the tasktracker hears from the JobTracker what this task should do - commit/discard the output. The debatable thing here is that we are blocking RPC handlers when a task reaches commit-pending state. So the expectation is that we'd hear back from the JobTracker pretty soon and anyway the tasktracker can't do much (like launching new tasks) before it hears from the JobTracker. Also the number of RPC handlers have been increased in the patch. There are ways to get around without blocking the RPC handler but this seemed like a simple approach and should not be a big deal since we are dealing with very (node) local RPCs. 5) A whole lot of changes to do getRecordWriter have been made in the patch to do with removal of the "ignored" parameter to the method. 6) The taskcommit queue code has been removed from the JT. This patch requires testing and may have some bugs at this point. But, I am hoping that it makes to 0.18. So could someone please take a quick look at the approach. Thanks! > Move task file promotion into the task > -------------------------------------- > > Key: HADOOP-3150 > URL: https://issues.apache.org/jira/browse/HADOOP-3150 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: Owen O'Malley > Assignee: Devaraj Das > Fix For: 0.18.0 > > Attachments: 3150.patch > > > We need to move the task file promotion from the JobTracker to the Task and move it down into the output format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.