hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prashant Kommireddi <prash1...@gmail.com>
Subject Re: Job end notification does not always work (Hadoop 2.x)
Date Sat, 22 Jun 2013 21:32:21 GMT
Following-up on this. Please let me know if this is expected/bug and if you
would like me to file a JIRA>


On Thu, Jun 20, 2013 at 9:45 PM, Prashant Kommireddi <prash1784@gmail.com>wrote:

> Hello,
>
> I came across an issue that occurs with the job notification callbacks in
> MR2. It works fine if the Application master has started, but does not send
> a callback if the initializing of AM fails.
>
> Here is the code from MRAppMaster.java
>
> .....
> .......
>
>       // set job classloader if configured
>       MRApps.setJobClassLoader(conf);
>       initAndStartAppMaster(appMaster, conf, jobUserName);
>     } catch (Throwable t) {
>       LOG.fatal("Error starting MRAppMaster", t);
>       System.exit(1);
>     }
>   }
>
> protected static void initAndStartAppMaster(final MRAppMaster appMaster,
>       final YarnConfiguration conf, String jobUserName) throws IOException,
>       InterruptedException {
>     UserGroupInformation.setConfiguration(conf);
>     UserGroupInformation appMasterUgi = UserGroupInformation
>         .createRemoteUser(jobUserName);
>     appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
>       @Override
>       public Object run() throws Exception {
>         appMaster.init(conf);
>         appMaster.start();
>         if(appMaster.errorHappenedShutDown) {
>           throw new IOException("Was asked to shut down.");
>         }
>         return null;
>       }
>     });
>   }
>
> appMaster.init(conf) does not dispatch JobFinishEventHandler which is
> responsible for sending a HTTP callback (via shutDownJob()). If there was
> an exception at this time, the process would simply terminate (via
> System.exit(1) )
>
> appMaster.start() however rightly uses the JobFinishEventHandler and
> things work fine.
>
> Shouldn't a failure on init(..) also send a callback suggesting the job
> failed?
>
> Thanks,
> Prashant
>
>

Mime
View raw message