Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 79592 invoked from network); 21 Aug 2008 16:27:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Aug 2008 16:27:43 -0000 Received: (qmail 56985 invoked by uid 500); 21 Aug 2008 16:27:33 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 56955 invoked by uid 500); 21 Aug 2008 16:27:33 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 56939 invoked by uid 99); 21 Aug 2008 16:27:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Aug 2008 09:27:33 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Aug 2008 16:26:45 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 766E0234C1C3 for ; Thu, 21 Aug 2008 09:26:44 -0700 (PDT) Message-ID: <343354162.1219336004484.JavaMail.jira@brutus> Date: Thu, 21 Aug 2008 09:26:44 -0700 (PDT) From: "Runping Qi (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-2676) Maintaining cluster information across multiple job submissions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624415#action_12624415 ] Runping Qi commented on HADOOP-2676: ------------------------------------ Intelligence should be built into the task tracker. It should decide whether to ask for tasks to run based on its health state (memory availability, tmp disk space, network connectivity, loads, and other diagnostic information such as the history of previous failed tasks). > Maintaining cluster information across multiple job submissions > --------------------------------------------------------------- > > Key: HADOOP-2676 > URL: https://issues.apache.org/jira/browse/HADOOP-2676 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Affects Versions: 0.15.2 > Reporter: Lohit Vijayarenu > > Could we have a way to maintain cluster state across multiple job submissions. > Consider a scenario where we run multiple jobs in iteration on a cluster back to back. The nature of the job is same, but input/output might differ. > Now, if a node is blacklisted in one iteration of job run, it would be useful to maintain this information and blacklist this node for next iteration of job as well. > Another situation which we saw is, if there are failures less than mapred.map.max.attempts in each iterations few nodes are never marked for blacklisting. But in we consider two or three iterations, these nodes fail all jobs and should be taken out of cluster. This hampers overall performance of the job. > Could have have config variables something which matches a job type (provided by user) and maintains the cluster status for that job type alone? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.