hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3289) Hadoop should have a way to know when JobTracker is really ready to accept jobs.
Date Mon, 21 Apr 2008 20:59:21 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591054#action_12591054

Owen O'Malley commented on HADOOP-3289:

This should probably look like the safemode stuff with get and wait operations.

> Hadoop should have a way to know when JobTracker is really ready to accept jobs.
> --------------------------------------------------------------------------------
>                 Key: HADOOP-3289
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3289
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Vinod Kumar Vavilapalli
> Hadoop throws an org.apache.hadoop.mapred.JobTracker$IllegalStateException when we try
to submit jobs while JT is still initializing and cannot accept jobs yet. This might be because
of various reasons, like job submitted too early, or JT waiting for response from NN which
might be in safemode (HADOOP-2213) or JT fails to clean-up mapred system directory(HADOOP-3276).
This causes problems in HoD or any other user scripts automatically submitting jobs.
> To deal with such problems, we need to have a way either to find out the state of the
job tracker so that this can be checked before launching any job, or, otherwise, a way to
determine if JT can accept any jobs now. Currently there is no api/command line interface
to check this in Hadoop. Even job submission doesn't return any specific error code, even
in presence of IllegalStateException error. So, the only reliable way, for HOD/scripts to
detect such exceptions as these, is to search for exception strings in the output of these
commands, which is kind of nasty.
> So, it would be good if Hadoop can provide an api/error code/cmd line utility to check
if JT is really ready to accept any jobs. Otherwise HoD/user scripts will be left with resorting
to (unreliable) way of sleeping for arbitrary amounts of time and retrying w/o knowing the
actual reason.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message