Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 24627 invoked from network); 21 Apr 2008 21:02:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Apr 2008 21:02:44 -0000 Received: (qmail 49955 invoked by uid 500); 21 Apr 2008 21:02:44 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 49925 invoked by uid 500); 21 Apr 2008 21:02:44 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 49912 invoked by uid 99); 21 Apr 2008 21:02:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Apr 2008 14:02:44 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Apr 2008 21:02:00 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 7C2F4234C0ED for ; Mon, 21 Apr 2008 13:59:21 -0700 (PDT) Message-ID: <458179187.1208811561507.JavaMail.jira@brutus> Date: Mon, 21 Apr 2008 13:59:21 -0700 (PDT) From: "Owen O'Malley (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3289) Hadoop should have a way to know when JobTracker is really ready to accept jobs. In-Reply-To: <526842879.1208758281384.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591054#action_12591054 ] Owen O'Malley commented on HADOOP-3289: --------------------------------------- This should probably look like the safemode stuff with get and wait operations. > Hadoop should have a way to know when JobTracker is really ready to accept jobs. > -------------------------------------------------------------------------------- > > Key: HADOOP-3289 > URL: https://issues.apache.org/jira/browse/HADOOP-3289 > Project: Hadoop Core > Issue Type: New Feature > Reporter: Vinod Kumar Vavilapalli > > Hadoop throws an org.apache.hadoop.mapred.JobTracker$IllegalStateException when we try to submit jobs while JT is still initializing and cannot accept jobs yet. This might be because of various reasons, like job submitted too early, or JT waiting for response from NN which might be in safemode (HADOOP-2213) or JT fails to clean-up mapred system directory(HADOOP-3276). This causes problems in HoD or any other user scripts automatically submitting jobs. > To deal with such problems, we need to have a way either to find out the state of the job tracker so that this can be checked before launching any job, or, otherwise, a way to determine if JT can accept any jobs now. Currently there is no api/command line interface to check this in Hadoop. Even job submission doesn't return any specific error code, even in presence of IllegalStateException error. So, the only reliable way, for HOD/scripts to detect such exceptions as these, is to search for exception strings in the output of these commands, which is kind of nasty. > So, it would be good if Hadoop can provide an api/error code/cmd line utility to check if JT is really ready to accept any jobs. Otherwise HoD/user scripts will be left with resorting to (unreliable) way of sleeping for arbitrary amounts of time and retrying w/o knowing the actual reason. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.