hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Prakash <ravi...@ymail.com>
Subject Re: Job end notification does not always work (Hadoop 2.x)
Date Sat, 22 Jun 2013 21:38:15 GMT
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism
(i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with
you that we can do more. If you feel strongly about this, please create a JIRA and possibly
upload a patch.

Thanks
Ravi




________________________________
 From: Prashant Kommireddi <prash1784@gmail.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org> 
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)
 


Hello,

I came across an issue that occurs with the job notification callbacks in MR2. It works fine
if the Application master has started, but does not send a callback if the initializing of
AM fails.

Here is the code from MRAppMaster.java

.....
.......

// set job classloader if configured MRApps.setJobClassLoader(conf); initAndStartAppMaster(appMaster,
conf, jobUserName); } catch (Throwable t) { LOG.fatal("Error starting MRAppMaster", t); System.exit(1);
} }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,
      final YarnConfiguration conf, String jobUserName) throws IOException,
      InterruptedException {
    UserGroupInformation.setConfiguration(conf);
    UserGroupInformation appMasterUgi = UserGroupInformation
        .createRemoteUser(jobUserName);
    appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
      @Override
      public Object run() throws Exception {
        appMaster.init(conf);
        appMaster.start();
        if(appMaster.errorHappenedShutDown) {
          throw new IOException("Was asked to shut down.");
        }
        return null;
      }
    });
  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending
a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would
simply terminate (via System.exit(1) )

appMaster.start() however rightly uses the JobFinishEventHandler and things work fine.

Shouldn't a failure on init(..) also send a callback suggesting the job failed?

Thanks,

Prashant
Mime
View raw message