reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobin Baker <tdba...@cs.washington.edu>
Subject Re: [Java] ClientConfiguration.ON_RUNTIME_ERROR handler not being run when Driver submission fails
Date Mon, 16 May 2016 15:58:46 GMT
Thanks for the clarification. I didn't get any exception at all in this
scenario. My client program just kept running after the Driver submission
failed (which was noted in a REEF log message), and no handlers were
called. The RM log indicated that the RM had failed to create a job
submission directory for the application (actually that may have been a NM
error that was remoted to the RM; logs are gone so I'm not sure).

On Sun, May 15, 2016 at 9:15 AM, Markus Weimer <markus@weimo.de> wrote:

> On 2016-05-13 15:52, Tobin Baker wrote:
>
>> Hi, I just had the YARN ResourceManager fail to launch my Driver
>> application because of a file permissions error, but my handler registered
>> with ClientConfiguration.ON_RUNTIME_ERROR was never called (nor was
>> ClientConfiguration.ON_JOB_FAILED). I assumed that any YARN runtime errors
>> launching the Driver should trigger this handler; was I mistaken?
>>
>
> We don't have a test case for when the actual submission fails, so this
> might be a bug in REEF.
>
> The `ON_JOB_FAILED` handler is actually fed from the Driver via the
> network, so it can't be called here. However, it is the logical place to do
> it. `ON_RUNTIME_ERROR` is meant for when the RM itself becomes unavailable
> or such. However, that is fed by the YARN client and thus more accessible
> to us in the case you face above.
>
> Speaking of which: Do you get the exception thrown by
> org.apache.reef.runtime.yarn.client.YarnJobSubmissionHandler.onNext() in
> line 121? If so, that is the place where we can grab it and route it into
> one of the above event handlers.
>
> Markus
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message