reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobin Baker <>
Subject Re: Question about failure scenarios
Date Wed, 22 Jun 2016 20:31:27 GMT
My experience on the Java side has been that if insufficient resources are
available to allocate all requested Evaluators, the RM will just silently
keep retrying forever, with no exceptions thrown by YARN or REEF. I guess
this behavior might be desirable for high-churn scenarios, but it's quite
confusing in general, since it usually means that I've miscalculated the
amount of available resources and hence my request will never succeed, not
that the resources I requested are only temporarily unavailable. An
exception that hinted at the cause of the failed allocation would be very
helpful, although I don't know if it's possible to get this information
from the RM (I had to figure it out from RM logs).

On Tue, Jun 21, 2016 at 7:43 PM, Julia Wang (QIUHE) <> wrote:

> I have some questions regarding some failure scenarios:
> We request some Evalutors, for some reason, not all of them are
> allocated.  Is this a valid scenario? How do we know if the request is not
> fulfilled? Would Java side handle such case or throw errors? Or shall we
> use the number of IAllocatedEvaluartors to check if all the requested
> evalutors are allocated?  If yes, shall we use timeout because the event
> could come later.
> Similarly after we submit ServiceAndContexts, how do we know all the
> submissions are successful? If something is wrong inside context, I guess
> we will receive IFailedContext. But if submitting itself has problem, how
> do we know? Shall we count the number of IActiveContext received?
> Same for Task submitting. A task can be submitted on an allocated but
> somehow failed evaluator. How do we know if the submitting is not
> successful? Just by counting number of IRunningTasks received? How long we
> need to wait before claiming not all tasks are submitted successfully?
> Thanks,
> Julia

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message